medium error

beans · Post by **beans** » Fri Jul 05, 2013 1:53 am

QNAP TS-EC879U-RP, eight WDC WD3000FYYZ drives, RAID6 being "synchronized" (striped, silvered, initialized, etc.) - getting quite a few of the following log entries:
- Host: Drive7 read error corrected.
- [Harddisk 7] I/O error, sense_key=0x3, asc=0x11, ascq=0x4, CDB=28 00 26 92 03 30 00 04 00 00 .
- [Harddisk 7] medium error. Please run bad block scan on this drive or replace the drive if the error persists.

... about 200 of "read error corrected", four each of I/O and "medium" errors. Since I am running initialization for the 3rd time, this may have caused three instances of I/O and "medium" errors. All four I/O error entries have the same numbers - does it mean the error happens on the same physical block / sector of the hard drive?

What's the best course of action? Wait for the RAID set to get initialized, run a "bad block scan" and see what happens?

Thanks.

Gaudi · Post by **Gaudi** » Fri Jul 05, 2013 1:56 am

Or replace the hard drive.

beans · Post by **beans** » Fri Jul 05, 2013 2:13 am

Gaudi wrote:Or replace the hard drive.

Will WDC replace it based on just the above data? Didn't think so.

doktornotor · Post by **doktornotor** » Fri Jul 05, 2013 2:24 am

You need to ask WD, not us... They should know perfectly well what the codes mean. These are produced by the HDD itself, not by QNAP.

beans · Post by **beans** » Fri Jul 05, 2013 2:45 am

doktornotor wrote:These are produced by the HDD itself, not by QNAP.

Source?

According to Sun SCSI Sense Key Error Guide (http://docs.oracle.com/cd/E19105-01/sto ... 918-10.pdf, page 5), this is a RAID controller error message.

doktornotor · Post by **doktornotor** » Fri Jul 05, 2013 2:51 am

Uh... you did not read the description, did you?

schumaku · Post by **schumaku** » Fri Jul 05, 2013 3:09 am

The RAID controller firmware, during a normal
read or volumes verification operation, corrects
the bad sector on the drive by reconstructing the
data (assuming a RAID 1+0 or RAID 5
configuration) and writing it back to the drive.
The drive, in turn, writes the data to a spare
sector. Ensure that the volume is scrubbed on a
regular basis. If the volumes in the RAID device
are configured as RAID 0, then the data is lost
and drive replacement is required.

Completely garbage in this context - the RAID "controller" (there is none on the NAS here...) - can not create such messages: That's the expectation that the RAID controller will be able to deal with it.

Yes, there _might_ be controllers able to create similar taps (with different IDs) - however these must be in sync with the SCSI standards. FMI: INCITS - Technical Committee T10 - SCSI Storage Interfaces

beans · Post by **beans** » Fri Jul 05, 2013 3:14 am

doktornotor wrote:Uh... you did not read the description, did you?

I thought I did... Care to be a little less cryptic and a little more specific? The point is, the error seems to be produced by QNAP's RAID management (caused by an I/O error with the disk but not the disk itself), and thus, may not be accepted by WDC at face value. SMART stats on that disk show nothing unusual.

doktornotor · Post by **doktornotor** » Fri Jul 05, 2013 3:17 am

1/ There is no RAID controller in action, no scrubbing in place anywhere. The RAID is pure SW Linux RAID..
2/ The message tells you the drive is so bad that beyond the sectors being unreadable, they cannot be even realocated (mostly like since there are no spare sectors left.)

Conclusion: the HDD is toast. Go RMA it instead of wasting time with pointless debates about what emits the message. Potential rants about HDD quality should be directed to the vendor, not QNAP.

beans · Post by **beans** » Fri Jul 05, 2013 3:18 am

schumaku wrote:Completely garbage in this context - the RAID "controller" (there is none on the NAS here...)

Why does it matter if the RAID controller is outsourced to hardware or to software? There is a RAID "controller", it's just software code. On hardware RAID controller, it's software code, too - some burned into ASICs, some into firmware.

Can we get to the bottom line please? What tests do I need to run, steps to take, to get an RMA from my disty or WDC themselves. Doc's point that it's a native disk error message - doesn't seem to be valid.

doktornotor · Post by **doktornotor** » Fri Jul 05, 2013 3:20 am

Geez. Out of this stupid thread, pure waste of time. WD have their own forum for advise about HDD diagnostics, RMA questions and whatever similar. You are totally offtopic here.

schumaku · Post by **schumaku** » Fri Jul 05, 2013 3:21 am

Again: These events happen on the HDD itself, events and reactions are processed on the HDD controller, and the events are sent to the SATA controller. What we have today is in fact the standardized version of what DEC introduced on the DSA architecture, and custom implemented on the early SCSI controllers some 30 years ago.

If you will have more recovered blocks, or block replacements you will see it in the SAMRT numbers soon.

Once the RAID is in sync - trigger full SMART tests for all HDD. Don't forget to schedule regular SMART tests - full tests at least once per week, brief tests once per day.

beans · Post by **beans** » Fri Jul 05, 2013 3:39 am

schumaku wrote:If you will have more recovered blocks, or block replacements you will see it in the SAMRT numbers soon.

Once the RAID is in sync - trigger full SMART tests for all HDD. Don't forget to schedule regular SMART tests - full tests at least once per week, brief tests once per day.

Thanks, will do.

beans · Post by **beans** » Fri Jul 05, 2013 10:47 am

Results of Bad Block Scan:

2013-07-04 19:40:47 System 127.0.0.1 localhost [Drive 7] Bad Blocks Scan completed.
2013-07-04 19:40:47 System 127.0.0.1 localhost [RAID6 Disk Volume: Drive 1 2 3 4 5 6 7 8] Drive 7 added into the volume.
2013-07-04 13:35:30 System 127.0.0.1 localhost [Drive 7] Start scanning bad blocks.

SMART info on that suspect disk is virtually the same as on others: no errors, retry counts are all zeros, and so are:
- Raw_Read_Error_Rate - 0
- Reallocated_Sector_Ct - 0
- Reallocated_Event_Count - 0

Anyone knows where to look for bad block scan results?

P3R · Post by **P3R** » Fri Jul 05, 2013 4:57 pm

beans wrote:SMART info on that suspect disk is virtually the same as on others: no errors, retry counts are all zeros, and so are:
- Raw_Read_Error_Rate - 0
- Reallocated_Sector_Ct - 0
- Reallocated_Event_Count - 0

And nothing in Current_Pending_Sector?

Anyone knows where to look for bad block scan results?

In the system log, that you already showed us.

You could run the full or long test from the WD diagnostic software on the in another computer.

Other than that, the advice to check with WD is not bad at all. These are expensive enterprise disks so I would expect them to be attentive and supportive to your issues.

QNAP NAS Community Forum

medium error

medium error

Re: medium error

Re: medium error

Re: medium error

Re: medium error

Re: medium error

Re: medium error

Re: medium error

Re: medium error

Re: medium error

Re: medium error

Re: medium error

Re: medium error

Re: medium error

Re: medium error