RAID 5: Degradation, Failure and Migration
-
- Starting out
- Posts: 11
- Joined: Wed Jun 09, 2021 10:35 pm
RAID 5: Degradation, Failure and Migration
I’ve gotten myself into quite a situation with my NAS (technically, two NAS’s) and was looking for some help.
I’ll warn you: I’m no enthusiast when it come PC’s and I know even less about NAS’s. I’m pretty good at following instructions, so long as I’m told which menus to go to or what commands to copy and paste (though I have no idea where the command prompt is in QTS).
I know you guys love to see the system profiles for problems like these. I’d love to provide them, just tell me how!
My troubles are a multistep process:
Step 1: RAID 5 Degradation and Disk Erasure
Several months ago my TS-451A (RAID 5) notified me of a thick volume degradation. Looking at the storage profiles the disk in bay 3 (a 4 TB Ironwolf) wasn’t being detected by the NAS (under Storage > Disks/VJBOD Bay 3 would show up empty). I figured this meant the disk was toast but ran the Seagate diagnostic tool. It passed but still wouldn’t work in the NAS. Out of desperation I tried reformatting the disk (don’t ask me why) to see if that would trigger a rebuild. No luck.
I then bought a replacement HDD. It also didn’t work in Bay 3 which made me realize this was probably an issue with the NAS itself.
I looked over the forum and found users with backplain failures. I contacted QNAP support and they suggested this is likely my issue, too.
Step 2: RAID Migration
Since a new NAS was not much more than the cost of the repair I decided to spend a few months saving up for a TS-451D2 and migrate the disks into the new unit. When the day came to migrate I was in the process of backing up my data on an external drive ahead the migration when I got an error that my RAID group “is not active”. This time, Bay 4 was also sporadically disconnecting / reconnecting to the RAID and eventually stopped connecting altogether.
Realizing that there wasn’t much I could do with a rig that was slowly loosing drive bays I decided to migrate into the 451D2 and hope for the best.
Step 3: My Current Situation
Presently, the raid volume is “not active” in my TS-451D2. All drives are being detected by the NAS with the following status for each of the bays:
• BAY 1: The drive condition is “Good” and its “Used Type” (as listed under Storage > Disks VJBOD) is “Data” (dark blue).
• BAY 2: Same as Bay 1.
• BAY 3: This is the drive I reformatted. I performed a complete test of the disk and its condition is good. Its Used Type is “Free” (dark grey).
• BAY 4: This is the disk from the bay that just failed in the old rig. Complete test of the disk says its condition is good. Its “Used Type” is also “Free”.
Step 4: Plan of Action?
I came across this thread from a user having similar problems. I’m thinking (read: hoping) that my RAID isn’t toast since the data on the disk in Bay 4 should still be intact. Rather, I think the issue is that when Bay 4 first failed the disk became unassociated with the RAID and didn’t get automatically reconnected / rebuilt because of the previous degradation in Bay 3. In the link above the user solved the problem by performing a “forced RAID mount” of the disks onto the RAID.
I’m thinking I could do something similar: force the disk in Bay 4 to be a part of the RAID and once this happens the RAID will be degraded (but active) and will be able to rebuild disk 3. To me, it seems like it’s worth a try at this point!
Does this plan seem reasonable? And if so, how do I go about running a “forced RAID mount”? Any help will be much appreciated.
I’ll warn you: I’m no enthusiast when it come PC’s and I know even less about NAS’s. I’m pretty good at following instructions, so long as I’m told which menus to go to or what commands to copy and paste (though I have no idea where the command prompt is in QTS).
I know you guys love to see the system profiles for problems like these. I’d love to provide them, just tell me how!
My troubles are a multistep process:
Step 1: RAID 5 Degradation and Disk Erasure
Several months ago my TS-451A (RAID 5) notified me of a thick volume degradation. Looking at the storage profiles the disk in bay 3 (a 4 TB Ironwolf) wasn’t being detected by the NAS (under Storage > Disks/VJBOD Bay 3 would show up empty). I figured this meant the disk was toast but ran the Seagate diagnostic tool. It passed but still wouldn’t work in the NAS. Out of desperation I tried reformatting the disk (don’t ask me why) to see if that would trigger a rebuild. No luck.
I then bought a replacement HDD. It also didn’t work in Bay 3 which made me realize this was probably an issue with the NAS itself.
I looked over the forum and found users with backplain failures. I contacted QNAP support and they suggested this is likely my issue, too.
Step 2: RAID Migration
Since a new NAS was not much more than the cost of the repair I decided to spend a few months saving up for a TS-451D2 and migrate the disks into the new unit. When the day came to migrate I was in the process of backing up my data on an external drive ahead the migration when I got an error that my RAID group “is not active”. This time, Bay 4 was also sporadically disconnecting / reconnecting to the RAID and eventually stopped connecting altogether.
Realizing that there wasn’t much I could do with a rig that was slowly loosing drive bays I decided to migrate into the 451D2 and hope for the best.
Step 3: My Current Situation
Presently, the raid volume is “not active” in my TS-451D2. All drives are being detected by the NAS with the following status for each of the bays:
• BAY 1: The drive condition is “Good” and its “Used Type” (as listed under Storage > Disks VJBOD) is “Data” (dark blue).
• BAY 2: Same as Bay 1.
• BAY 3: This is the drive I reformatted. I performed a complete test of the disk and its condition is good. Its Used Type is “Free” (dark grey).
• BAY 4: This is the disk from the bay that just failed in the old rig. Complete test of the disk says its condition is good. Its “Used Type” is also “Free”.
Step 4: Plan of Action?
I came across this thread from a user having similar problems. I’m thinking (read: hoping) that my RAID isn’t toast since the data on the disk in Bay 4 should still be intact. Rather, I think the issue is that when Bay 4 first failed the disk became unassociated with the RAID and didn’t get automatically reconnected / rebuilt because of the previous degradation in Bay 3. In the link above the user solved the problem by performing a “forced RAID mount” of the disks onto the RAID.
I’m thinking I could do something similar: force the disk in Bay 4 to be a part of the RAID and once this happens the RAID will be degraded (but active) and will be able to rebuild disk 3. To me, it seems like it’s worth a try at this point!
Does this plan seem reasonable? And if so, how do I go about running a “forced RAID mount”? Any help will be much appreciated.
- dolbyman
- Guru
- Posts: 35275
- Joined: Sat Feb 12, 2011 2:11 am
- Location: Vancouver BC , Canada
Re: RAID 5: Degradation, Failure and Migration
You just discovered why you need backups at ALL TIMEs, not just when it's too late
login via SSH and do a
post the results back in code tags
login via SSH and do a
Code: Select all
md_checker
-
- Starting out
- Posts: 11
- Joined: Wed Jun 09, 2021 10:35 pm
Re: RAID 5: Degradation, Failure and Migration
Agreed. It's really unfortunate: I had bought an enclosure as part of this and I could have used my 5th dive to backup my essential files the whole time. It just didn't occur to me until it was time to migrate.
Regardless, here are the results of the md_checker:
Code: Select all
[~] # md_checker
Welcome to MD superblock checker (v2.0) - have a nice day~
Scanning system...
RAID metadata found!
UUID: ce7ab1be:0e80230a:780a4d6e:c8119d04
Level: raid5
Devices: 4
Name: md1
Chunk Size: 512K
md Version: 1.0
Creation Time: Jan 14 08:15:39 2018
Status: OFFLINE
===============================================================================================
Enclosure | Port | Block Dev Name | # | Status | Last Update Time | Events | Array State
===============================================================================================
NAS_HOST 1 /dev/sda3 0 Active Jun 4 23:35:15 2021 536127 AA.A
NAS_HOST 2 /dev/sdb3 1 Active Jun 4 23:35:15 2021 536127 AA.A
NAS_HOST 3 /dev/sdc3 2 Rebuild Aug 19 16:18:29 2020 72071 AAAA
NAS_HOST 4 /dev/sdd3 3 Active Jun 4 23:30:45 2021 536126 AA.A
===============================================================================================
[~] #
[~] #
-
- Starting out
- Posts: 11
- Joined: Wed Jun 09, 2021 10:35 pm
Re: RAID 5: Degradation, Failure and Migration
So I've been doing a bit of research and it seems like a mdadm command might fix my issue. Trouble is I have no idea how to set this up: it seems like it's specific to every type of RAID and this is fairly far outside my realm of expertise.
Any idea if a mdadm command is what I need and how to set one up for my RAID?
Any help would be very much appreciated
Any idea if a mdadm command is what I need and how to set one up for my RAID?
Any help would be very much appreciated
- dolbyman
- Guru
- Posts: 35275
- Joined: Sat Feb 12, 2011 2:11 am
- Location: Vancouver BC , Canada
Re: RAID 5: Degradation, Failure and Migration
try this
It should mount it with 3 out of 4 and sync the 4th (/dev/sdc3) back in
Code: Select all
mdadm --assemble /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
-
- Starting out
- Posts: 11
- Joined: Wed Jun 09, 2021 10:35 pm
Re: RAID 5: Degradation, Failure and Migration
Thank you so much for this!dolbyman wrote: ↑Fri Jun 11, 2021 2:09 am try this
It should mount it with 3 out of 4 and sync the 4th (/dev/sdc3) back inCode: Select all
mdadm --assemble /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
I'll run it and let you know how it goes.
-
- Starting out
- Posts: 11
- Joined: Wed Jun 09, 2021 10:35 pm
Re: RAID 5: Degradation, Failure and Migration
I got this message when I ran the mdadm command:dolbyman wrote: ↑Fri Jun 11, 2021 2:09 am try this
It should mount it with 3 out of 4 and sync the 4th (/dev/sdc3) back inCode: Select all
mdadm --assemble /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
Code: Select all
[~] # mdadm --assemble /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
mdadm: failed to get exclusive lock on mapfile - continue anyway...
mdadm: device 6 in /dev/md1 has wrong state in superblock, but /dev/sdd3 seems ok
mdadm: /dev/md1 assembled from 3 drives - not enough to start the array while not clean - consider --force.
[~] #
- dolbyman
- Guru
- Posts: 35275
- Joined: Sat Feb 12, 2011 2:11 am
- Location: Vancouver BC , Canada
Re: RAID 5: Degradation, Failure and Migration
well .. you can force it then (a bit risky)
Code: Select all
mdadm --force --assemble /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
-
- Starting out
- Posts: 11
- Joined: Wed Jun 09, 2021 10:35 pm
Re: RAID 5: Degradation, Failure and Migration
So I entered the command above and got this message:dolbyman wrote: ↑Fri Jun 11, 2021 4:15 am well .. you can force it then (a bit risky)
Code: Select all
mdadm --force --assemble /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
Code: Select all
[~] # mdadm --force --assemble /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
mdadm: --force does not set the mode, and so cannot be the first option.
[~] #
- dolbyman
- Guru
- Posts: 35275
- Joined: Sat Feb 12, 2011 2:11 am
- Location: Vancouver BC , Canada
Re: RAID 5: Degradation, Failure and Migration
try --assemble --force (swap them)
-
- Starting out
- Posts: 11
- Joined: Wed Jun 09, 2021 10:35 pm
Re: RAID 5: Degradation, Failure and Migration
OK, here are the results:
Code: Select all
[~] # mdadm --assemble --force /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
mdadm: failed to get exclusive lock on mapfile - continue anyway...
mdadm: clearing FAULTY flag for device 3 in /dev/md1 for /dev/sdd3
mdadm: Marking array /dev/md1 as 'clean'
mdadm: /dev/md1 has been started with 3 drives (out of 4).
Here are the results from another md_checker:
Code: Select all
Welcome to MD superblock checker (v2.0) - have a nice day~
Scanning system...
RAID metadata found!
UUID: ce7ab1be:0e80230a:780a4d6e:c8119d04
Level: raid5
Devices: 4
Name: md1
Chunk Size: 512K
md Version: 1.0
Creation Time: Jan 14 08:15:39 2018
Status: ONLINE (md1) [UU_U]
===============================================================================================
Enclosure | Port | Block Dev Name | # | Status | Last Update Time | Events | Array State
===============================================================================================
NAS_HOST 1 /dev/sda3 0 Active Jun 4 23:35:15 2021 536127 AA.A
NAS_HOST 2 /dev/sdb3 1 Active Jun 4 23:35:15 2021 536127 AA.A
NAS_HOST 3 /dev/sdc3 2 Rebuild Aug 19 16:18:29 2020 72071 AAAA
NAS_HOST 4 /dev/sdd3 3 Active Jun 4 23:30:45 2021 536126 AA.A
- dolbyman
- Guru
- Posts: 35275
- Joined: Sat Feb 12, 2011 2:11 am
- Location: Vancouver BC , Canada
Re: RAID 5: Degradation, Failure and Migration
ok .. now that the RAID is online try to reinit the LVM
Code: Select all
init_lvm.sh
-
- Starting out
- Posts: 11
- Joined: Wed Jun 09, 2021 10:35 pm
Re: RAID 5: Degradation, Failure and Migration
Result:dolbyman wrote: ↑Sat Jun 12, 2021 2:45 am ok .. now that the RAID is online try to reinit the LVM
Code: Select all
init_lvm.sh
Code: Select all
[~] # init_lvm.sh
-sh: init_lvm.sh: command not found
-
- Starting out
- Posts: 11
- Joined: Wed Jun 09, 2021 10:35 pm
Re: RAID 5: Degradation, Failure and Migration
I've seen this code being used by other users:dolbyman wrote: ↑Sat Jun 12, 2021 2:45 am ok .. now that the RAID is online try to reinit the LVM
Code: Select all
init_lvm.sh
Code: Select all
[~] # /etc/init.d/init_lvm.sh
- dolbyman
- Guru
- Posts: 35275
- Joined: Sat Feb 12, 2011 2:11 am
- Location: Vancouver BC , Canada
Re: RAID 5: Degradation, Failure and Migration
yes, try that one