RAID 5: Degradation, Failure and Migration

GreenGuy · Post by **GreenGuy** » Wed Jun 09, 2021 10:44 pm

I’ve gotten myself into quite a situation with my NAS (technically, two NAS’s) and was looking for some help.

I’ll warn you: I’m no enthusiast when it come PC’s and I know even less about NAS’s. I’m pretty good at following instructions, so long as I’m told which menus to go to or what commands to copy and paste (though I have no idea where the command prompt is in QTS).

I know you guys love to see the system profiles for problems like these. I’d love to provide them, just tell me how!

My troubles are a multistep process:

Step 1: RAID 5 Degradation and Disk Erasure
Several months ago my TS-451A (RAID 5) notified me of a thick volume degradation. Looking at the storage profiles the disk in bay 3 (a 4 TB Ironwolf) wasn’t being detected by the NAS (under Storage > Disks/VJBOD Bay 3 would show up empty). I figured this meant the disk was toast but ran the Seagate diagnostic tool. It passed but still wouldn’t work in the NAS. Out of desperation I tried reformatting the disk (don’t ask me why) to see if that would trigger a rebuild. No luck.

I then bought a replacement HDD. It also didn’t work in Bay 3 which made me realize this was probably an issue with the NAS itself.

I looked over the forum and found users with backplain failures. I contacted QNAP support and they suggested this is likely my issue, too.

Step 2: RAID Migration
Since a new NAS was not much more than the cost of the repair I decided to spend a few months saving up for a TS-451D2 and migrate the disks into the new unit. When the day came to migrate I was in the process of backing up my data on an external drive ahead the migration when I got an error that my RAID group “is not active”. This time, Bay 4 was also sporadically disconnecting / reconnecting to the RAID and eventually stopped connecting altogether.

Realizing that there wasn’t much I could do with a rig that was slowly loosing drive bays I decided to migrate into the 451D2 and hope for the best.

Step 3: My Current Situation
Presently, the raid volume is “not active” in my TS-451D2. All drives are being detected by the NAS with the following status for each of the bays:
• BAY 1: The drive condition is “Good” and its “Used Type” (as listed under Storage > Disks VJBOD) is “Data” (dark blue).
• BAY 2: Same as Bay 1.
• BAY 3: This is the drive I reformatted. I performed a complete test of the disk and its condition is good. Its Used Type is “Free” (dark grey).
• BAY 4: This is the disk from the bay that just failed in the old rig. Complete test of the disk says its condition is good. Its “Used Type” is also “Free”.

Step 4: Plan of Action?
I came across this thread from a user having similar problems. I’m thinking (read: hoping) that my RAID isn’t toast since the data on the disk in Bay 4 should still be intact. Rather, I think the issue is that when Bay 4 first failed the disk became unassociated with the RAID and didn’t get automatically reconnected / rebuilt because of the previous degradation in Bay 3. In the link above the user solved the problem by performing a “forced RAID mount” of the disks onto the RAID.

I’m thinking I could do something similar: force the disk in Bay 4 to be a part of the RAID and once this happens the RAID will be degraded (but active) and will be able to rebuild disk 3. To me, it seems like it’s worth a try at this point!

Does this plan seem reasonable? And if so, how do I go about running a “forced RAID mount”? Any help will be much appreciated.

dolbyman · Post by **dolbyman** » Wed Jun 09, 2021 11:12 pm

You just discovered why you need backups at ALL TIMEs, not just when it's too late

login via SSH and do a

Code: Select all

md_checker

post the results back in code tags

GreenGuy · Post by **GreenGuy** » Thu Jun 10, 2021 1:33 am

dolbyman wrote: ↑Wed Jun 09, 2021 11:12 pm You just discovered why you need backups at ALL TIMEs, not just when it's too late

Agreed. It's really unfortunate: I had bought an enclosure as part of this and I could have used my 5th dive to backup my essential files the whole time. It just didn't occur to me until it was time to migrate.

Regardless, here are the results of the md_checker:

Code: Select all

[~] # md_checker

Welcome to MD superblock checker (v2.0) - have a nice day~

Scanning system...


RAID metadata found!
UUID:           ce7ab1be:0e80230a:780a4d6e:c8119d04
Level:          raid5
Devices:        4
Name:           md1
Chunk Size:     512K
md Version:     1.0
Creation Time:  Jan 14 08:15:39 2018
Status:         OFFLINE
===============================================================================================
 Enclosure | Port | Block Dev Name | # | Status |   Last Update Time   | Events | Array State
===============================================================================================
 NAS_HOST       1        /dev/sda3   0   Active   Jun  4 23:35:15 2021   536127   AA.A
 NAS_HOST       2        /dev/sdb3   1   Active   Jun  4 23:35:15 2021   536127   AA.A
 NAS_HOST       3        /dev/sdc3   2  Rebuild   Aug 19 16:18:29 2020    72071   AAAA
 NAS_HOST       4        /dev/sdd3   3   Active   Jun  4 23:30:45 2021   536126   AA.A
===============================================================================================

[~] #
[~] #

GreenGuy · Post by **GreenGuy** » Fri Jun 11, 2021 2:05 am

So I've been doing a bit of research and it seems like a mdadm command might fix my issue. Trouble is I have no idea how to set this up: it seems like it's specific to every type of RAID and this is fairly far outside my realm of expertise.

Any idea if a mdadm command is what I need and how to set one up for my RAID?

Any help would be very much appreciated

dolbyman · Post by **dolbyman** » Fri Jun 11, 2021 2:09 am

try this

Code: Select all

mdadm --assemble /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3

It should mount it with 3 out of 4 and sync the 4th (/dev/sdc3) back in

GreenGuy · Post by **GreenGuy** » Fri Jun 11, 2021 3:06 am

dolbyman wrote: ↑Fri Jun 11, 2021 2:09 am try this
Code: Select all
mdadm --assemble /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
It should mount it with 3 out of 4 and sync the 4th (/dev/sdc3) back in

Thank you so much for this!

I'll run it and let you know how it goes.

GreenGuy · Post by **GreenGuy** » Fri Jun 11, 2021 3:27 am

dolbyman wrote: ↑Fri Jun 11, 2021 2:09 am try this
Code: Select all
mdadm --assemble /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
It should mount it with 3 out of 4 and sync the 4th (/dev/sdc3) back in

I got this message when I ran the mdadm command:

Code: Select all

[~] # mdadm --assemble /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
mdadm: failed to get exclusive lock on mapfile - continue anyway...
mdadm: device 6 in /dev/md1 has wrong state in superblock, but /dev/sdd3 seems ok
mdadm: /dev/md1 assembled from 3 drives - not enough to start the array while not clean - consider --force.
[~] #

Is this referring to the forced mount I read about earlier? If you happen to know how to perform this I'd be extremely grateful

dolbyman · Post by **dolbyman** » Fri Jun 11, 2021 4:15 am

well .. you can force it then (a bit risky)

Code: Select all

mdadm --force --assemble /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3

GreenGuy · Post by **GreenGuy** » Fri Jun 11, 2021 10:13 pm

dolbyman wrote: ↑Fri Jun 11, 2021 4:15 am well .. you can force it then (a bit risky)
Code: Select all
mdadm --force --assemble /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3

So I entered the command above and got this message:

Code: Select all

[~] # mdadm --force --assemble /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
mdadm: --force does not set the mode, and so cannot be the first option.
[~] #

Any help would be much appreciated

dolbyman · Post by **dolbyman** » Fri Jun 11, 2021 10:20 pm

try --assemble --force (swap them)

GreenGuy · Post by **GreenGuy** » Sat Jun 12, 2021 2:36 am

dolbyman wrote: ↑Fri Jun 11, 2021 10:20 pm try --assemble --force (swap them)

OK, here are the results:

Code: Select all

[~] # mdadm --assemble --force /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
mdadm: failed to get exclusive lock on mapfile - continue anyway...
mdadm: clearing FAULTY flag for device 3 in /dev/md1 for /dev/sdd3
mdadm: Marking array /dev/md1 as 'clean'
mdadm: /dev/md1 has been started with 3 drives (out of 4).

Drives 3 and 4 are now a part of the volume but the raid is still not active (I'm getting a "volume is not active" alert in the dashboard). Storage & Snapshots also lists the RAID group as Read Only.

Here are the results from another md_checker:

Code: Select all

Welcome to MD superblock checker (v2.0) - have a nice day~

Scanning system...


RAID metadata found!
UUID:           ce7ab1be:0e80230a:780a4d6e:c8119d04
Level:          raid5
Devices:        4
Name:           md1
Chunk Size:     512K
md Version:     1.0
Creation Time:  Jan 14 08:15:39 2018
Status:         ONLINE (md1) [UU_U]
===============================================================================================
 Enclosure | Port | Block Dev Name | # | Status |   Last Update Time   | Events | Array State
===============================================================================================
 NAS_HOST       1        /dev/sda3   0   Active   Jun  4 23:35:15 2021   536127   AA.A
 NAS_HOST       2        /dev/sdb3   1   Active   Jun  4 23:35:15 2021   536127   AA.A
 NAS_HOST       3        /dev/sdc3   2  Rebuild   Aug 19 16:18:29 2020    72071   AAAA
 NAS_HOST       4        /dev/sdd3   3   Active   Jun  4 23:30:45 2021   536126   AA.A

I haven't restarted the NAS yet. I didn't know if it was safe to do so at this point or what else I should run...

dolbyman · Post by **dolbyman** » Sat Jun 12, 2021 2:45 am

ok .. now that the RAID is online try to reinit the LVM

Code: Select all

init_lvm.sh

GreenGuy · Post by **GreenGuy** » Sat Jun 12, 2021 3:42 am

dolbyman wrote: ↑Sat Jun 12, 2021 2:45 am ok .. now that the RAID is online try to reinit the LVM
Code: Select all
init_lvm.sh

Result:

Code: Select all

[~] # init_lvm.sh
-sh: init_lvm.sh: command not found

Is there another command I should try? Thanks again for being so helpful

GreenGuy · Post by **GreenGuy** » Sat Jun 12, 2021 3:53 am

dolbyman wrote: ↑Sat Jun 12, 2021 2:45 am ok .. now that the RAID is online try to reinit the LVM
Code: Select all
init_lvm.sh

I've seen this code being used by other users:

Code: Select all

[~] # /etc/init.d/init_lvm.sh

Is this the correct code or does this do something completely different?

dolbyman · Post by **dolbyman** » Sat Jun 12, 2021 4:38 am

yes, try that one

QNAP NAS Community Forum

RAID 5: Degradation, Failure and Migration

RAID 5: Degradation, Failure and Migration

Re: RAID 5: Degradation, Failure and Migration

Re: RAID 5: Degradation, Failure and Migration

Re: RAID 5: Degradation, Failure and Migration

Re: RAID 5: Degradation, Failure and Migration

Re: RAID 5: Degradation, Failure and Migration

Re: RAID 5: Degradation, Failure and Migration

Re: RAID 5: Degradation, Failure and Migration

Re: RAID 5: Degradation, Failure and Migration

Re: RAID 5: Degradation, Failure and Migration

Re: RAID 5: Degradation, Failure and Migration

Re: RAID 5: Degradation, Failure and Migration

Re: RAID 5: Degradation, Failure and Migration

Re: RAID 5: Degradation, Failure and Migration

Re: RAID 5: Degradation, Failure and Migration