TS-659 unable to rescue data from failing RAID 5

edgars · Post by **edgars** » Sun Oct 12, 2014 3:41 am

Suggestion - go away from RAID5. With big drives 2TB+ it's a suicide. RAID 1 and 10 are your friends.

joerideschepper · Post by **joerideschepper** » Sun Oct 12, 2014 10:28 pm

Seems like I'm not allowed to contact you directly.
Could you send the putty log to ... please?

pwilson · Post by **pwilson** » Mon Oct 13, 2014 1:34 am

joerideschepper wrote:Seems like I'm not allowed to contact you directly.
Could you send the putty log to <my Gmail address> please?

It's a really bad idea to publish your E-mail address to a Public Forum!

Why are you volunteering for Spam e-mail attacks?

Why didn't you simply ask him to post his PuTTY logs directly to this Forum (for the benefit of all Community Members), or to post them to http://pastebin.com/. This too would benefit the entire Community, as well as the OP, and would do so without opening you (or him) to Spam.

Since QNAP removed the "PM" feature of this Forum system, it is far easier to simply refer people to the QNAP Helpdesk. If we all refer these type of questions to the QNAP Helpdesk, then perhaps QNAP might actually "grab a brain" and restore this useful feature to the Forum.

They claimed it was a temporary measure, but I note they have not bothered to restore it for us yet, so I'm beginning to think they have no intention of doing so. Please review thread: Incompetent Forums Administration.....

joerideschepper · Post by **joerideschepper** » Wed Oct 15, 2014 2:45 am

I know mr Wilson.
However, I did not know about pastebin. Thank you for the info.
I hadn't checked the policies of the forum either... bad me again.
Also bad me for having had confidence in the recoverability of a RAID5 system.
But I just have to give it a try to recover anything from it.

pwilson · Post by **pwilson** » Wed Oct 15, 2014 3:06 am

joerideschepper wrote:I know mr Wilson.
However, I did not know about pastebin. Thank you for the info.
I hadn't checked the policies of the forum either... bad me again.
Also bad me for having had confidence in the recoverability of a RAID5 system.
But I just have to give it a try to recover anything from it.

I've been using RAID5 for years, and I swear by it. I have no idea how "Mr RAID5 is suicide" formed his opinion, but here on my NAS it works very well. I have experienced several drive failures over the years, but I simply "hot-swap" the failed drive when this happens, and wait for the RAID5 array to rebuild itself. RAID5 is only a challenge when using dubious drives and experiencing "multiple drive failure". (RAID5 can only tolerate a "single drive failure"). This thread is all about RAID and drive failures, yet not a single participant in this thread has bothered to disclose the Drive Make/Model that they are using.

The amount of information provided in this thread is extremely limited. I took the time to provide commands that can be run on the NAS to harvest the missing information, but so far no one has chosen to provide either the information requested, nor provide the output of my NASReport script. No one, (myself included), can provide assistance with only partial information provided, so this thread hasn't been very active, and is basically a complete waste of time. You too have attempted to help, but again, without basic information provided there is very little we can do to help people.

BTW, please call me "Patrick". Mr Wilson was my father.

joerideschepper · Post by **joerideschepper** » Thu Oct 16, 2014 4:48 am

Hi Patrick

I thought you might have liked some more info about my system and I also didn't really explain the issue I have.
But on the other hand you'll probably be annoyed again because it hangs at the hdparm part of the script.
Anyway, here goes:

*********************
** QNAP NAS Report **
*********************

NAS Model: TS-659 Pro+
Firmware: 4.0.7 Build 20140412
System Name: QNAP001
Workgroup: NAS
Base Directory: /share/MD0_DATA
NAS IP address: 10.0.1.20

Default Gateway Device: bond0

inet addr:10.0.1.20 Bcast:10.0.1.255 Mask:255.255.255.0
UP BROADCAST NOTRAILERS RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:861843 errors:0 dropped:74154 overruns:0 frame:0
TX packets:2406582 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:129121877 (123.1 MiB) TX bytes:227475258 (216.9 MiB)

DNS Nameserver(s):10.0.1.1

HDD Information:

/dev/sda
Model=WDC WD2001FASS-00W2B0 , FwRev=05.01D05, SerialNo= WD-WMAY00550281
/dev/sdb
Model=WDC WD2001FASS-00W2B0 , FwRev=05.01D05, SerialNo= WD-WMAY00535636
/dev/sdc

There are actually 6 drives, WD black, 2TB each.
Yes, I know the drives are not really RAID certified... but they had a 5 year warranty when I bought them and they were the best drives to fit my budget.
Because of the 5 year warranty I figured their quality would be ok to last for some time and they actually did last for a number of years, running 24/7 most of that time..

Issue started actually with drive 1 showing a red LED... SMART test showed status "normal", but it had some read errors.
As they'd been running for a number of years, I guessed they had held up pretty well.
So as you said, RAID 5 should be able to handle it and I replaced the drive.
Not so... rebuilding hung several times and made the web interface hang,
so I went to check through ssh.
Rebuild seemed to hang at about 27%.
Couldn't do dmesg, only a pipe to tail worked, which showed me it considered my new drive 1 to be ok, but read errors occurring on drive 3...
So I put back the "defective" drive 1 and tried to do a scan to try to fix the read errors on both drives.
After the scan, it seemed like drive 1 didn't really had that many read errors, about 160 or so,
while drive 3 all of a sudden seemed to have over a 1000 read errors.
I haven't gotten a lot further with my data retrieval since because I lack time.
I mostly refreshed my Linux knowledge concerning RAID configuration.
The Putty log I received was interesting and pointed me more in the direction of file system and partition checks.
Unfortunately I haven't had enough time yet to perform them on my system.
In the mean time, the web interface and ssh no longer hang and I can run dmesg, with the following output:
[~] # dmesg
for UDMA/133
[ 910.713296] ata4: EH complete
[ 914.349168] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[ 914.353040] ata4.00: irq_stat 0x40000008
[ 914.356841] ata4.00: failed command: READ FPDMA QUEUED
[ 914.360378] ata4.00: cmd 60/08:00:e8:1b:1b/00:00:00:00:00/40 tag 0 ncq 4096 in
[ 914.360382] res 41/40:00:ed:1b:1b/00:00:00:00:00/40 Emask 0x409 (media error) <F>
[ 914.368075] ata4.00: status: { DRDY ERR }
[ 914.371580] ata4.00: error: { UNC }
[ 914.385413] ata4.00: configured for UDMA/133
[ 914.388944] ata4: EH complete
[ 923.215736] ata4.00: exception Emask 0x0 SAct 0x3805101 SErr 0x0 action 0x0
[ 923.219311] ata4.00: irq_stat 0x40000001
[ 923.223141] ata4.00: failed command: READ FPDMA QUEUED
[ 923.226640] ata4.00: cmd 60/80:00:48:47:1c/00:00:00:00:00/40 tag 0 ncq 65536 in
[ 923.226644] res 41/40:00:00:00:00/00:00:00:00:00/00 Emask 0x9 (media error)
[ 923.234359] ata4.00: status: { DRDY ERR }
[ 923.237829] ata4.00: error: { UNC }
[ 923.241450] ata4.00: failed command: READ FPDMA QUEUED
[ 923.244845] ata4.00: cmd 60/08:40:d0:8e:13/00:00:35:00:00/40 tag 8 ncq 4096 in
[ 923.244849] res 41/40:00:00:00:00/00:00:00:00:00/00 Emask 0x9 (media error)
[ 923.252376] ata4.00: status: { DRDY ERR }
[ 923.255848] ata4.00: error: { UNC }
[ 923.259483] ata4.00: failed command: READ FPDMA QUEUED
[ 923.262907] ata4.00: cmd 60/10:60:88:28:6d/00:00:42:00:00/40 tag 12 ncq 8192 in
[ 923.262911] res 41/40:00:89:28:6d/00:00:42:00:00/40 Emask 0x409 (media error) <F>
[ 923.270462] ata4.00: status: { DRDY ERR }
[ 923.273929] ata4.00: error: { UNC }
[ 923.277534] ata4.00: failed command: READ FPDMA QUEUED
[ 923.280966] ata4.00: cmd 60/08:70:f8:5b:e0/00:00:42:00:00/40 tag 14 ncq 4096 in
[ 923.280971] res 41/40:00:00:00:00/00:00:00:00:00/00 Emask 0x9 (media error)
[ 923.288694] ata4.00: status: { DRDY ERR }
[ 923.292202] ata4.00: error: { UNC }
[ 923.295897] ata4.00: failed command: READ FPDMA QUEUED
[ 923.299397] ata4.00: cmd 60/10:b8:68:f5:39/00:00:45:00:00/40 tag 23 ncq 8192 in
[ 923.299401] res 41/40:00:00:00:00/00:00:00:00:00/00 Emask 0x9 (media error)
[ 923.307148] ata4.00: status: { DRDY ERR }
[ 923.310625] ata4.00: error: { UNC }
[ 923.314321] ata4.00: failed command: READ FPDMA QUEUED
[ 923.317855] ata4.00: cmd 60/08:c0:80:f5:39/00:00:45:00:00/40 tag 24 ncq 4096 in
[ 923.317859] res 41/40:00:00:00:00/00:00:00:00:00/00 Emask 0x9 (media error)
[ 923.325717] ata4.00: status: { DRDY ERR }
[ 923.329296] ata4.00: error: { UNC }
[ 923.333065] ata4.00: failed command: READ FPDMA QUEUED
[ 923.336620] ata4.00: cmd 60/f8:c8:90:f5:39/00:00:45:00:00/40 tag 25 ncq 126976 in
[ 923.336625] res 41/40:00:00:00:00/00:00:00:00:00/00 Emask 0x9 (media error)
[ 923.344462] ata4.00: status: { DRDY ERR }
[ 923.348058] ata4.00: error: { UNC }
[ 923.361319] ata4.00: configured for UDMA/133
[ 923.364928] ata4: EH complete
[ 926.290713] ata4.00: exception Emask 0x0 SAct 0x70 SErr 0x0 action 0x0
[ 926.294315] ata4.00: irq_stat 0x40000008
[ 926.297885] ata4.00: failed command: READ FPDMA QUEUED
[ 926.301447] ata4.00: cmd 60/10:20:88:28:6d/00:00:42:00:00/40 tag 4 ncq 8192 in
[ 926.301450] res 41/40:00:89:28:6d/00:00:42:00:00/40 Emask 0x409 (media error) <F>
[ 926.308591] ata4.00: status: { DRDY ERR }
[ 926.312173] ata4.00: error: { UNC }
[ 926.325258] ata4.00: configured for UDMA/133
[ 926.328867] ata4: EH complete
[ 929.274026] ata4.00: exception Emask 0x0 SAct 0x4 SErr 0x0 action 0x0
[ 929.277636] ata4.00: irq_stat 0x40000008
[ 929.281181] ata4.00: failed command: READ FPDMA QUEUED
[ 929.284716] ata4.00: cmd 60/10:10:88:28:6d/00:00:42:00:00/40 tag 2 ncq 8192 in
[ 929.284719] res 41/40:00:89:28:6d/00:00:42:00:00/40 Emask 0x409 (media error) <F>
[ 929.291828] ata4.00: status: { DRDY ERR }
[ 929.295391] ata4.00: error: { UNC }
[ 929.309219] ata4.00: configured for UDMA/133
[ 929.312761] ata4: EH complete
[ 932.107351] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[ 932.111927] ata4.00: irq_stat 0x40000008
[ 932.116471] ata4.00: failed command: READ FPDMA QUEUED
[ 932.121275] ata4.00: cmd 60/10:00:88:28:6d/00:00:42:00:00/40 tag 0 ncq 8192 in
[ 932.121280] res 41/40:00:89:28:6d/00:00:42:00:00/40 Emask 0x409 (media error) <F>
[ 932.130931] ata4.00: status: { DRDY ERR }
[ 932.135000] ata4.00: error: { UNC }
[ 932.148399] ata4.00: configured for UDMA/133
[ 932.152190] ata4: EH complete
[ 935.065654] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[ 935.069203] ata4.00: irq_stat 0x40000008
[ 935.072654] ata4.00: failed command: READ FPDMA QUEUED
[ 935.076176] ata4.00: cmd 60/10:00:88:28:6d/00:00:42:00:00/40 tag 0 ncq 8192 in
[ 935.076180] res 41/40:00:89:28:6d/00:00:42:00:00/40 Emask 0x409 (media error) <F>
[ 935.083353] ata4.00: status: { DRDY ERR }
[ 935.087004] ata4.00: error: { UNC }
[ 935.100000] ata4.00: configured for UDMA/133
[ 935.104530] ata4: EH complete
[ 937.907299] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[ 937.911024] ata4.00: irq_stat 0x40000008
[ 937.914765] ata4.00: failed command: READ FPDMA QUEUED
[ 937.918111] ata4.00: cmd 60/10:00:88:28:6d/00:00:42:00:00/40 tag 0 ncq 8192 in
[ 937.918115] res 41/40:00:89:28:6d/00:00:42:00:00/40 Emask 0x409 (media error) <F>
[ 937.925557] ata4.00: status: { DRDY ERR }
[ 937.928975] ata4.00: error: { UNC }
[ 937.942784] ata4.00: configured for UDMA/133
[ 937.946121] sd 3:0:0:0: [sdc] Unhandled sense code
[ 937.949827] sd 3:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 937.953361] sd 3:0:0:0: [sdc] Sense Key : Medium Error [current] [descriptor]
[ 937.957374] Descriptor sense data with sense descriptors (in hex):
[ 937.960845] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 937.964915] 42 6d 28 89
[ 937.968717] sd 3:0:0:0: [sdc] Add. Sense: Unrecovered read error - auto reallocate failed
[ 937.972254] sd 3:0:0:0: [sdc] CDB: Read(10): 28 00 42 6d 28 88 00 00 10 00
[ 937.976298] end_request: I/O error, dev sdc, sector 1114450057
[ 937.979782] md/raid:md0: read error not correctable (sector 1112329472 on sdc3).
[ 937.983740] raid5: some error occurred in a active device:2 of md0.
[ 937.987407] raid5: Keep the raid device active in degraded mode but set readonly.
[ 937.991140] md/raid:md0: read error not correctable (sector 1112329480 on sdc3).
[ 937.994835] raid5: some error occurred in a active device:2 of md0.
[184876.683891] rule type=1, num=2
[184877.101287] rule type=1, num=2
[184880.096927] rule type=1, num=2
[184880.106618] rule type=1, num=2
[184900.116336] rule type=1, num=2
[184900.126086] rule type=1, num=2
[184920.136145] rule type=1, num=2
[184920.145883] rule type=1, num=2
[184940.155641] rule type=1, num=2
[184940.165347] rule type=1, num=2
[184960.175335] rule type=1, num=2
[184960.184960] rule type=1, num=2
[184980.194808] rule type=1, num=2
[184980.204545] rule type=1, num=2
[185000.214164] rule type=1, num=2
[185000.223852] rule type=1, num=2
[185020.233645] rule type=1, num=2
[185020.243462] rule type=1, num=2
[185040.253308] rule type=1, num=2
[185040.262974] rule type=1, num=2
[185060.272642] rule type=1, num=2
[185060.282222] rule type=1, num=2
[185080.291840] rule type=1, num=2
[185080.301638] rule type=1, num=2
[185100.311159] rule type=1, num=2
[185100.321345] rule type=1, num=2
[185120.330967] rule type=1, num=2
[185120.340307] rule type=1, num=2
[185140.350329] rule type=1, num=2
[185140.359781] rule type=1, num=2
[185160.369351] rule type=1, num=2
[185160.378687] rule type=1, num=2
[185180.388767] rule type=1, num=2
[185180.398442] rule type=1, num=2
[191698.845784] md: md0 still in use.
[~] #

I hadn't contacted the Qnap helpdesk yet because I'm not sure they'd even want to help me out anyway.
I've had this system for quite a number of years and I don't have a support contract or anything similar that I know of.
Sorry to be wasting your time

pwilson · Post by **pwilson** » Fri Oct 17, 2014 1:26 am

joerideschepper wrote:Hi Patrick

I thought you might have liked some more info about my system and I also didn't really explain the issue I have.
But on the other hand you'll probably be annoyed again because it hangs at the hdparm part of the script.
Anyway, here goes:

Code: Select all

*********************
** QNAP NAS Report **
*********************

NAS Model:      TS-659 Pro+
Firmware:       4.0.7 Build 20140412
System Name:    QNAP001
Workgroup:      NAS
Base Directory: /share/MD0_DATA
NAS IP address: 10.0.1.20

Default Gateway Device: bond0

          inet addr:10.0.1.20  Bcast:10.0.1.255  Mask:255.255.255.0
          UP BROADCAST NOTRAILERS RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:861843 errors:0 dropped:74154 overruns:0 frame:0
          TX packets:2406582 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:129121877 (123.1 MiB)  TX bytes:227475258 (216.9 MiB)


DNS Nameserver(s):10.0.1.1


HDD Information:

/dev/sda
 Model=WDC WD2001FASS-00W2B0                   , FwRev=05.01D05, SerialNo=     WD-WMAY00550281
/dev/sdb
 Model=WDC WD2001FASS-00W2B0                   , FwRev=05.01D05, SerialNo=     WD-WMAY00535636
/dev/sdc

There are actually 6 drives, WD black, 2TB each.
Yes, I know the drives are not really RAID certified... but they had a 5 year warranty when I bought them and they were the best drives to fit my budget.
Because of the 5 year warranty I figured their quality would be ok to last for some time and they actually did last for a number of years, running 24/7 most of that time..

Issue started actually with drive 1 showing a red LED... SMART test showed status "normal", but it had some read errors.
As they'd been running for a number of years, I guessed they had held up pretty well.
So as you said, RAID 5 should be able to handle it and I replaced the drive.
Not so... rebuilding hung several times and made the web interface hang,
so I went to check through ssh.
Rebuild seemed to hang at about 27%.
Couldn't do dmesg, only a pipe to tail worked, which showed me it considered my new drive 1 to be ok, but read errors occurring on drive 3...
So I put back the "defective" drive 1 and tried to do a scan to try to fix the read errors on both drives.
After the scan, it seemed like drive 1 didn't really had that many read errors, about 160 or so,
while drive 3 all of a sudden seemed to have over a 1000 read errors.
I haven't gotten a lot further with my data retrieval since because I lack time.
I mostly refreshed my Linux knowledge concerning RAID configuration.
The Putty log I received was interesting and pointed me more in the direction of file system and partition checks.
Unfortunately I haven't had enough time yet to perform them on my system.
In the mean time, the web interface and ssh no longer hang and I can run dmesg, with the following output:

Code: Select all

[~] # dmesg
 for UDMA/133
[  910.713296] ata4: EH complete
[  914.349168] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[  914.353040] ata4.00: irq_stat 0x40000008
[  914.356841] ata4.00: failed command: READ FPDMA QUEUED
[  914.360378] ata4.00: cmd 60/08:00:e8:1b:1b/00:00:00:00:00/40 tag 0 ncq 4096 in
[  914.360382]          res 41/40:00:ed:1b:1b/00:00:00:00:00/40 Emask 0x409 (media error) <F>
[  914.368075] ata4.00: status: { DRDY ERR }
[  914.371580] ata4.00: error: { UNC }
[  914.385413] ata4.00: configured for UDMA/133
[  914.388944] ata4: EH complete
[  923.215736] ata4.00: exception Emask 0x0 SAct 0x3805101 SErr 0x0 action 0x0
[  923.219311] ata4.00: irq_stat 0x40000001
[  923.223141] ata4.00: failed command: READ FPDMA QUEUED
[  923.226640] ata4.00: cmd 60/80:00:48:47:1c/00:00:00:00:00/40 tag 0 ncq 65536 in
[  923.226644]          res 41/40:00:00:00:00/00:00:00:00:00/00 Emask 0x9 (media error)
[  923.234359] ata4.00: status: { DRDY ERR }
[  923.237829] ata4.00: error: { UNC }
[  923.241450] ata4.00: failed command: READ FPDMA QUEUED
[  923.244845] ata4.00: cmd 60/08:40:d0:8e:13/00:00:35:00:00/40 tag 8 ncq 4096 in
[  923.244849]          res 41/40:00:00:00:00/00:00:00:00:00/00 Emask 0x9 (media error)
[  923.252376] ata4.00: status: { DRDY ERR }
[  923.255848] ata4.00: error: { UNC }
[  923.259483] ata4.00: failed command: READ FPDMA QUEUED
[  923.262907] ata4.00: cmd 60/10:60:88:28:6d/00:00:42:00:00/40 tag 12 ncq 8192 in
[  923.262911]          res 41/40:00:89:28:6d/00:00:42:00:00/40 Emask 0x409 (media error) <F>
[  923.270462] ata4.00: status: { DRDY ERR }
[  923.273929] ata4.00: error: { UNC }
[  923.277534] ata4.00: failed command: READ FPDMA QUEUED
[  923.280966] ata4.00: cmd 60/08:70:f8:5b:e0/00:00:42:00:00/40 tag 14 ncq 4096 in
[  923.280971]          res 41/40:00:00:00:00/00:00:00:00:00/00 Emask 0x9 (media error)
[  923.288694] ata4.00: status: { DRDY ERR }
[  923.292202] ata4.00: error: { UNC }
[  923.295897] ata4.00: failed command: READ FPDMA QUEUED
[  923.299397] ata4.00: cmd 60/10:b8:68:f5:39/00:00:45:00:00/40 tag 23 ncq 8192 in
[  923.299401]          res 41/40:00:00:00:00/00:00:00:00:00/00 Emask 0x9 (media error)
[  923.307148] ata4.00: status: { DRDY ERR }
[  923.310625] ata4.00: error: { UNC }
[  923.314321] ata4.00: failed command: READ FPDMA QUEUED
[  923.317855] ata4.00: cmd 60/08:c0:80:f5:39/00:00:45:00:00/40 tag 24 ncq 4096 in
[  923.317859]          res 41/40:00:00:00:00/00:00:00:00:00/00 Emask 0x9 (media error)
[  923.325717] ata4.00: status: { DRDY ERR }
[  923.329296] ata4.00: error: { UNC }
[  923.333065] ata4.00: failed command: READ FPDMA QUEUED
[  923.336620] ata4.00: cmd 60/f8:c8:90:f5:39/00:00:45:00:00/40 tag 25 ncq 126976 in
[  923.336625]          res 41/40:00:00:00:00/00:00:00:00:00/00 Emask 0x9 (media error)
[  923.344462] ata4.00: status: { DRDY ERR }
[  923.348058] ata4.00: error: { UNC }
[  923.361319] ata4.00: configured for UDMA/133
[  923.364928] ata4: EH complete
[  926.290713] ata4.00: exception Emask 0x0 SAct 0x70 SErr 0x0 action 0x0
[  926.294315] ata4.00: irq_stat 0x40000008
[  926.297885] ata4.00: failed command: READ FPDMA QUEUED
[  926.301447] ata4.00: cmd 60/10:20:88:28:6d/00:00:42:00:00/40 tag 4 ncq 8192 in
[  926.301450]          res 41/40:00:89:28:6d/00:00:42:00:00/40 Emask 0x409 (media error) <F>
[  926.308591] ata4.00: status: { DRDY ERR }
[  926.312173] ata4.00: error: { UNC }
[  926.325258] ata4.00: configured for UDMA/133
[  926.328867] ata4: EH complete
[  929.274026] ata4.00: exception Emask 0x0 SAct 0x4 SErr 0x0 action 0x0
[  929.277636] ata4.00: irq_stat 0x40000008
[  929.281181] ata4.00: failed command: READ FPDMA QUEUED
[  929.284716] ata4.00: cmd 60/10:10:88:28:6d/00:00:42:00:00/40 tag 2 ncq 8192 in
[  929.284719]          res 41/40:00:89:28:6d/00:00:42:00:00/40 Emask 0x409 (media error) <F>
[  929.291828] ata4.00: status: { DRDY ERR }
[  929.295391] ata4.00: error: { UNC }
[  929.309219] ata4.00: configured for UDMA/133
[  929.312761] ata4: EH complete
[  932.107351] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[  932.111927] ata4.00: irq_stat 0x40000008
[  932.116471] ata4.00: failed command: READ FPDMA QUEUED
[  932.121275] ata4.00: cmd 60/10:00:88:28:6d/00:00:42:00:00/40 tag 0 ncq 8192 in
[  932.121280]          res 41/40:00:89:28:6d/00:00:42:00:00/40 Emask 0x409 (media error) <F>
[  932.130931] ata4.00: status: { DRDY ERR }
[  932.135000] ata4.00: error: { UNC }
[  932.148399] ata4.00: configured for UDMA/133
[  932.152190] ata4: EH complete
[  935.065654] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[  935.069203] ata4.00: irq_stat 0x40000008
[  935.072654] ata4.00: failed command: READ FPDMA QUEUED
[  935.076176] ata4.00: cmd 60/10:00:88:28:6d/00:00:42:00:00/40 tag 0 ncq 8192 in
[  935.076180]          res 41/40:00:89:28:6d/00:00:42:00:00/40 Emask 0x409 (media error) <F>
[  935.083353] ata4.00: status: { DRDY ERR }
[  935.087004] ata4.00: error: { UNC }
[  935.100000] ata4.00: configured for UDMA/133
[  935.104530] ata4: EH complete
[  937.907299] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[  937.911024] ata4.00: irq_stat 0x40000008
[  937.914765] ata4.00: failed command: READ FPDMA QUEUED
[  937.918111] ata4.00: cmd 60/10:00:88:28:6d/00:00:42:00:00/40 tag 0 ncq 8192 in
[  937.918115]          res 41/40:00:89:28:6d/00:00:42:00:00/40 Emask 0x409 (media error) <F>
[  937.925557] ata4.00: status: { DRDY ERR }
[  937.928975] ata4.00: error: { UNC }
[  937.942784] ata4.00: configured for UDMA/133
[  937.946121] sd 3:0:0:0: [sdc] Unhandled sense code
[  937.949827] sd 3:0:0:0: [sdc]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  937.953361] sd 3:0:0:0: [sdc]  Sense Key : Medium Error [current] [descriptor]
[  937.957374] Descriptor sense data with sense descriptors (in hex):
[  937.960845]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[  937.964915]         42 6d 28 89
[  937.968717] sd 3:0:0:0: [sdc]  Add. Sense: Unrecovered read error - auto reallocate failed
[  937.972254] sd 3:0:0:0: [sdc] CDB: Read(10): 28 00 42 6d 28 88 00 00 10 00
[  937.976298] end_request: I/O error, dev sdc, sector 1114450057
[  937.979782] md/raid:md0: read error not correctable (sector 1112329472 on sdc3).
[  937.983740] raid5: some error occurred in a active device:2 of md0.
[  937.987407] raid5: Keep the raid device active in degraded mode but set readonly.
[  937.991140] md/raid:md0: read error not correctable (sector 1112329480 on sdc3).
[  937.994835] raid5: some error occurred in a active device:2 of md0.
[184876.683891] rule type=1, num=2
[184877.101287] rule type=1, num=2
[184880.096927] rule type=1, num=2
[184880.106618] rule type=1, num=2
[184900.116336] rule type=1, num=2
[184900.126086] rule type=1, num=2
[184920.136145] rule type=1, num=2
[184920.145883] rule type=1, num=2
[184940.155641] rule type=1, num=2
[184940.165347] rule type=1, num=2
[184960.175335] rule type=1, num=2
[184960.184960] rule type=1, num=2
[184980.194808] rule type=1, num=2
[184980.204545] rule type=1, num=2
[185000.214164] rule type=1, num=2
[185000.223852] rule type=1, num=2
[185020.233645] rule type=1, num=2
[185020.243462] rule type=1, num=2
[185040.253308] rule type=1, num=2
[185040.262974] rule type=1, num=2
[185060.272642] rule type=1, num=2
[185060.282222] rule type=1, num=2
[185080.291840] rule type=1, num=2
[185080.301638] rule type=1, num=2
[185100.311159] rule type=1, num=2
[185100.321345] rule type=1, num=2
[185120.330967] rule type=1, num=2
[185120.340307] rule type=1, num=2
[185140.350329] rule type=1, num=2
[185140.359781] rule type=1, num=2
[185160.369351] rule type=1, num=2
[185160.378687] rule type=1, num=2
[185180.388767] rule type=1, num=2
[185180.398442] rule type=1, num=2
[191698.845784] md: md0 still in use.
[~] #

I hadn't contacted the Qnap helpdesk yet because I'm not sure they'd even want to help me out anyway.
I've had this system for quite a number of years and I don't have a support contract or anything similar that I know of.
Sorry to be wasting your time

You can contact the QNAP Helpdesk, they will still work with you, even if your NAS is out of warranty. As for your output, the "dmesg" output seems to be complaining about a failed HDD3 (/dev/sdc). Try hot-swapping HDD3 with a suitable replacement drive, RAID rebuild should start. (Did you note that my NASReport script was unable to retrieve any information about HDD3, and "dmesg" detected "Drive not ready" errors?).

Western Digital does not recommend Blacks, Blues, or Greens for use in RAID.

joerideschepper · Post by **joerideschepper** » Sat Jan 10, 2015 8:04 pm

Hi all

It's been a while (I think I mentioned I didn't have a lot of time)
but I did manage to salvage most of the contents of the NAS eventually.
I thought I'd share some of the commands and experience.

Try to mount MD0_DATA
mount /dev/md0 /share/MD0_DATA -t ext4
Didn't work... we lost 1 disk and had another degraded, remember?
more /proc/mdstat
mdadm -E /dev/sda3
mdadm -CfR –assume-clean /dev/md0 -l 5 -n 6 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3
mount /dev/md0 /share/MD0_DATA -t ext4
Didn't work either...
mdadm -D /dev/md0
mdadm --assemble --run --force /dev/md0 /dev/sd[abcdef]1
mdadm: device /dev/md0 already active - cannot assemble it
mdadm -CfR --assume-clean /dev/md0 -l 5 -n 6 /dev/sda3 /dev/sdb3 missing /dev/sdd3 /dev/sde3 /dev/sdf3
Finally worked...
Try to mount again
mount /dev/md0 /share/MD0_DATA/ -t ext4
Works, so now try to check it.
e2fsck_64 -f -v -C 0 /dev/md0
/usr/local/sbin/dumpe2fs /dev/md0 | grep superblock

Didn't work, the NAS doesn't have enough memory.
Had to add extra swap space on a spare usb stick (formatted as fat32):
Thanks to Rudim: http://forum.qnap.com/viewtopic.php?p=268094
dd if=/dev/zero of=/share/external/sdi1/myswapfile bs=1M count=3072
mkswap /share/external/sdi1/myswapfile
swapon /share/external/sdi1/myswapfile
cat /proc/swaps
du -hc --max-depth=3
rsync -v -r --update -i --list-only -e --ignore-errors <SRC>

Now I had enough memory to perform the fscheck

And redo the swapfile work:
[~] # swapoff /share/external/sdi1/myswapfile
[/] # cat /proc/swaps
Filename Type Size Used Priority
/dev/md6 partition 530044 40068 -1

And remove the stick:
[/] # umount /dev/sdi1
[/] # mount
/proc on /proc type proc (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
sysfs on /sys type sysfs (rw)
tmpfs on /tmp type tmpfs (rw,size=64M)
none on /proc/bus/usb type usbfs (rw)
/dev/sda4 on /mnt/ext type ext3 (rw)
/dev/md9 on /mnt/HDA_ROOT type ext3 (rw,data=ordered)
tmpfs on /.eaccelerator.tmp type tmpfs (rw,size=32M)
/dev/sdv1 on /share/external/sdv1 type ext4 (rw)
/dev/md0 on /share/MD0_DATA type ext4 (rw)
[/] #

MD0_Data now contained a bunch of folders within which I found back my original shares.
Simply used scp to copy everything out of there.

I bought new WD Red disks and reinstalled the NAS from scratch.
This time, a bit more careful, RAID6 with a hot spare, backup system is being constructed atm.
Yes, a lot of wasted space, but with the doubled disk size, the end result is about the same as before,
except that it'll resist failures a little better.

QNAP NAS Community Forum

TS-659 unable to rescue data from failing RAID 5

Re: TS-659 unable to rescue data from failing RAID 5

Re: TS-659 unable to rescue data from failing RAID 5

Re: TS-659 unable to rescue data from failing RAID 5

Re: TS-659 unable to rescue data from failing RAID 5

Re: TS-659 unable to rescue data from failing RAID 5

Re: TS-659 unable to rescue data from failing RAID 5

Re: TS-659 unable to rescue data from failing RAID 5

Re: TS-659 unable to rescue data from failing RAID 5