RAID6 Offline after disks failure

Discussion on remote replication.
Post Reply
yannickb
New here
Posts: 2
Joined: Fri Dec 27, 2019 10:25 am

RAID6 Offline after disks failure

Post by yannickb »

Hello,
after a motherboard suddenly died on a TS-870U-RP, I bought a new TS-877XU and moved all the 8 hard disks : all the data were successfully restored (such a relief... for a short time).

Unfortunately, after a few days, the disks (that were rather old) began to fail one after the other and I am now in a very bad situation :

my main Volume was using disks enclosures 2-3-5-6-7-8 (6 disks then) in RAID6 so, theorically, I could lose 2 disks and it happened :

Firstly, enclosure 5 failed, I replaced the disk with a new one and it stated to rebuild. Meanwhile, I configured the enclosure 4 as a Spare Disk (big mistake I think as I will explain later...).
During the rebuilding process, enclosure 7 failed as well (bad sectors) and everything hung (no access to web interface). After two days waiting, I decided to perform a hard reboot on the NAS : still hung.
After replacing the disk in enclosure 7, the web interface service went back to life but the RAID (/dev/md1) and the associated volume had disappeared.

On the web interface, it just says that "RAID device is inactive" and in Storage->Disks, the RAID group 1 seems to be wrong : it indicates in blue enclosures 2-4-5-6-8. Capacity, Raid group name, RAID type and disk member are all blanks.
On the CLI, md_checker gave me another strange result:

Code: Select all

RAID metadata found!
UUID:		73d3c616:09ac1b87:4f9d29a4:fa00100d
Level:		raid6
Devices:	7
Name:		md1
Chunk Size:	64K
md Version:	1.0
Creation Time:	Apr 7 18:34:39 2014
Status:		OFFLINE
===============================================================================
 Disk | Device | # | Status |   Last Update Time   | Events | Array State
===============================================================================
   5  /dev/sdg3  0  Rebuild   Dec 24 14:55:44 2019    71778   AAAA.AA                  
   2  /dev/sda3  1   Active   Dec 24 14:55:44 2019    71778   AAAA.AA                  
   8  /dev/sde3  2   Active   Dec 24 14:55:44 2019    71778   AAAA.AA                  
   3  /dev/sdd3  3   Active   Dec 24 14:55:44 2019    71778   AAAA.AA                  
   4  /dev/sdc3  3   Active   Dec 20 16:05:31 2019    23170   AAAAAAA                  
 --------------  4  Missing   -------------------------------------------
   6  /dev/sdf3  5   Active   Dec 24 14:55:44 2019    71778   AAAA.AA                  
 --------------  6  Missing   -------------------------------------------
===============================================================================
 WARNING: Duplicate device detected for #(3)!
It indicates that the RAID is composed by 7 devices, probably because I added the spare disk during the rebuild process, big mistake...
The "duplicate device detected" is a mystery to me...

Now if I try to re-assemble the RAID using mdadm and giving the 4 disks that should still be in good conditions (enclosure 2-3-6-8), it fails :

Code: Select all

[~] # mdadm --assemble /dev/md1 /dev/sda3 /dev/sdd3 /dev/sdf3 /dev/sde3
mdadm: failed to get exclusive lock on mapfile - continue anyway...
mdadm: /dev/md1 assembled from 4 drives - not enough to start the array.
probably because it considers the RAID is composed by 7 disks (6 initally + spare).

If I add the spare disk sdc3 on the command line, the error is exactly the same:

Code: Select all

[~] # mdadm --assemble /dev/md1 /dev/sda3 /dev/sdd3 /dev/sdf3 /dev/sde3 /dev/sdc3 
mdadm: failed to get exclusive lock on mapfile - continue anyway...
mdadm: /dev/md1 assembled from 4 drives - not enough to start the array.
My very last option is probably to use mdadm create with the --assume-clean options but before that I wanted to know :
- if it is possible to remove the spare disk in the RAID metadata found by md_checker so that it considers only 6 devices (then I could possibly try again with --assemble)
- how can I be sure of the order of the /dev/sdX3 in the create command ?
- how can things have turned so bad...? :(

Thanks for your valuable help!
yannickb
New here
Posts: 2
Joined: Fri Dec 27, 2019 10:25 am

Re: RAID6 Offline after disks failure

Post by yannickb »

To be more accurate, I add the output of the mdadm --examine command, for each 3rd partition of the disks :

Code: Select all

[~] # mdadm --examine /dev/sd[abcdefgh]3    
/dev/sda3:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x0
     Array UUID : 73d3c616:09ac1b87:4f9d29a4:fa00100d
           Name : 1
  Creation Time : Mon Apr  7 18:34:39 2014
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 3887119248 (1853.52 GiB 1990.21 GB)
     Array Size : 9717798080 (9267.61 GiB 9951.03 GB)
  Used Dev Size : 3887119232 (1853.52 GiB 1990.21 GB)
   Super Offset : 3887119504 sectors
   Unused Space : before=0 sectors, after=272 sectors
          State : clean
    Device UUID : d00e33e3:1614ca2f:8630c2f6:1fa64e1d

    Update Time : Tue Dec 24 14:55:44 2019
       Checksum : 877b9fdf - correct
         Events : 71778

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 1
   Array State : AAAA.AA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc3:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x0
     Array UUID : 73d3c616:09ac1b87:4f9d29a4:fa00100d
           Name : 1
  Creation Time : Mon Apr  7 18:34:39 2014
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 3887119248 (1853.52 GiB 1990.21 GB)
     Array Size : 9717798080 (9267.61 GiB 9951.03 GB)
  Used Dev Size : 3887119232 (1853.52 GiB 1990.21 GB)
   Super Offset : 3887119504 sectors
   Unused Space : before=0 sectors, after=272 sectors
          State : clean
    Device UUID : 56033483:db49a5f1:adf46526:1d87de18

    Update Time : Fri Dec 20 16:05:31 2019
       Checksum : 148839af - correct
         Events : 71778

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 3
   Array State : AAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd3:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x0
     Array UUID : 73d3c616:09ac1b87:4f9d29a4:fa00100d
           Name : 1
  Creation Time : Mon Apr  7 18:34:39 2014
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 3887119240 (1853.52 GiB 1990.21 GB)
     Array Size : 9717798080 (9267.61 GiB 9951.03 GB)
  Used Dev Size : 3887119232 (1853.52 GiB 1990.21 GB)
   Super Offset : 3887119504 sectors
   Unused Space : before=0 sectors, after=264 sectors
          State : clean
    Device UUID : 9d753d99:83a24617:6d0077f6:c6d41432

    Update Time : Tue Dec 24 14:55:44 2019
  Bad Block Log : 512 entries available at offset -8 sectors
       Checksum : 3985939f - correct
         Events : 71778

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 3
   Array State : AAAA.AA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde3:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x0
     Array UUID : 73d3c616:09ac1b87:4f9d29a4:fa00100d
           Name : 1
  Creation Time : Mon Apr  7 18:34:39 2014
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 3887119248 (1853.52 GiB 1990.21 GB)
     Array Size : 9717798080 (9267.61 GiB 9951.03 GB)
  Used Dev Size : 3887119232 (1853.52 GiB 1990.21 GB)
   Super Offset : 3887119504 sectors
   Unused Space : before=0 sectors, after=272 sectors
          State : clean
    Device UUID : ed8cc9d8:5702d9ad:91d8ec54:a021a53d

    Update Time : Tue Dec 24 14:55:44 2019
       Checksum : 79a22fcf - correct
         Events : 71778

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 2
   Array State : AAAA.AA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf3:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x0
     Array UUID : 73d3c616:09ac1b87:4f9d29a4:fa00100d
           Name : 1
  Creation Time : Mon Apr  7 18:34:39 2014
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 3887119248 (1853.52 GiB 1990.21 GB)
     Array Size : 9717798080 (9267.61 GiB 9951.03 GB)
  Used Dev Size : 3887119232 (1853.52 GiB 1990.21 GB)
   Super Offset : 3887119504 sectors
   Unused Space : before=0 sectors, after=272 sectors
          State : clean
    Device UUID : 2fb73fac:c4b4e312:52d0777e:717c4ecf

    Update Time : Tue Dec 24 14:55:44 2019
       Checksum : 6d575f0e - correct
         Events : 71778

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 5
   Array State : AAAA.AA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdg3:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x2
     Array UUID : 73d3c616:09ac1b87:4f9d29a4:fa00100d
           Name : 1
  Creation Time : Mon Apr  7 18:34:39 2014
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 5840623240 (2785.03 GiB 2990.40 GB)
     Array Size : 9717798080 (9267.61 GiB 9951.03 GB)
  Used Dev Size : 3887119232 (1853.52 GiB 1990.21 GB)
   Super Offset : 5840623504 sectors
Recovery Offset : 79110120 sectors
   Unused Space : before=0 sectors, after=1953504264 sectors
          State : clean
    Device UUID : dd91f96d:6eade260:fbc9eb9f:e4918fcb

    Update Time : Tue Dec 24 14:55:44 2019
  Bad Block Log : 512 entries available at offset -8 sectors
       Checksum : 88699761 - correct
         Events : 71778

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 0
   Array State : AAAR.AA ('A' == active, '.' == missing, 'R' == replacing)
mdadm: No md superblock detected on /dev/sdh3.
User avatar
Moogle Stiltzkin
Guru
Posts: 11445
Joined: Thu Dec 04, 2008 12:21 am
Location: Around the world....
Contact:

Re: RAID6 Offline after disks failure

Post by Moogle Stiltzkin »

raid is not a backup.....
https://www.reddit.com/r/qnap/comments/ ... _a_backup/


if those hdds really eol dying, you shoud replace them. based on pricing and all round performance, i recommend wd reds. Or if you want, you can try out the seagate ironwolves.

Cheaper than these are probably shucking wd elements or easystore usb drives, then using them in the qnap for raid. but i think shucked drives may have issues when claiming warranty....
NAS
[Main Server] QNAP TS-877 (QTS) w. 4tb [ 3x HGST Deskstar NAS & 1x WD RED NAS ] EXT4 Raid5 & 2 x m.2 SATA Samsung 850 Evo raid1 +16gb ddr4 Crucial+ QWA-AC2600 wireless+QXP PCIE
[Backup] QNAP TS-653A (Truenas Core) w. 4x 2TB Samsung F3 (HD203WI) RaidZ1 ZFS + 8gb ddr3 Crucial
[^] QNAP TL-D400S 2x 4TB WD Red Nas (WD40EFRX) 2x 4TB Seagate Ironwolf, Raid5
[^] QNAP TS-509 Pro w. 4x 1TB WD RE3 (WD1002FBYS) EXT4 Raid5
[^] QNAP TS-253D (Truenas Scale)
[Mobile NAS] TBS-453DX w. 2x Crucial MX500 500gb EXT4 raid1

Network
Qotom Pfsense|100mbps FTTH | Win11, Ryzen 5600X Desktop (1x2tb Crucial P50 Plus M.2 SSD, 1x 8tb seagate Ironwolf,1x 4tb HGST Ultrastar 7K4000)


Resources
[Review] Moogle's QNAP experience
[Review] Moogle's TS-877 review
https://www.patreon.com/mooglestiltzkin
S.Haran
Getting the hang of things
Posts: 71
Joined: Sun Dec 16, 2018 12:17 am
Contact:

Re: RAID6 Offline after disks failure

Post by S.Haran »

This is a complex case. You need 5 of 7 to assemble the RAID6 but only 4 are useable (sda3 sdd3 sde3 sdf3). I think the path to recovery depends on the condition of the two missing RAID members. Is either one of them detected? Can you pull SMART info from one or both? If so post back there is hope for recovery.
On-Line Data Recovery Consultant. RAID / NAS / Linux Specialist.
Serving clients worldwide since 2011. Complex cases welcome.
https://FreeDataRecovery.us
Post Reply

Return to “Remote Replication/ Disaster Recovery”