disk failue - spare disk does not automatically replace defective drive

Questions about SNMP, Power, System, Logs, disk, & RAID.
Post Reply
wikon-it
New here
Posts: 2
Joined: Tue Aug 04, 2020 2:09 pm

disk failue - spare disk does not automatically replace defective drive

Post by wikon-it » Tue Aug 04, 2020 2:15 pm

Hi community,

we have set up a TS-EC1279U-RP with 12 Bays and 12 1TB disks. Drive4 has failed although SMART status is still ok. The drive shows up in red. All the others are green except for drive 12 which has been configured as a spare drive.

What I expected is actually that this drive will automatically jump in for the broken disk as soon as the system detects a broken disk. But it seems to do nothing.

Is there any manual action needed to make the spare replace the broken drive?

And can the broken drive easily be replaced online by just pulling it out and inserting a new one and nothing bad will happen?
Do I have to disable hot spare first when I replace the broken disk? I read that hot spare possibly only gets active, when a disk is pulled out (which actually would not make much sense as you had to go there physically where the SAN is located) but actually, if I have a replacement disk anyway, I don't need the spare disk to jump in.
You do not have the required permissions to view the files attached to this post.

P3R
Guru
Posts: 12636
Joined: Sat Dec 29, 2007 1:39 am
Location: Stockholm, Sweden (UTC+01:00)

Re: disk failue - spare disk does not automatically replace defective drive

Post by P3R » Tue Aug 04, 2020 4:36 pm

wikon-it wrote:
Tue Aug 04, 2020 2:15 pm
What I expected is actually that this drive will automatically jump in for the broken disk as soon as the system detects a broken disk. But it seems to do nothing.
That is the experience many users with online spare disks have.
Is there any manual action needed to make the spare replace the broken drive?
In theory it shouldn't but now that you've noticed it doesn't work as expected, it's easily fixed.
And can the broken drive easily be replaced online by just pulling it out and inserting a new one and nothing bad will happen?
When you pull the disk, the spare will jump in unless there's a another problem.
Do I have to disable hot spare first when I replace the broken disk?
You have to decide what it is you want to do.
Either you pull the faulty disk and allow the spare to take it's place...
OR
...you disable the spare disk and replace the faulty drive manually.
I read that hot spare possibly only gets active, when a disk is pulled out...
Of course that's not the way it's intended but it's often the real life experience.
RAID have never ever been a replacement for backups. Without backups on a different system (preferably placed at another site), you will eventually lose data!

A non-RAID configuration (including RAID 0, which isn't really RAID) with a backup on a separate media protects your data far better than any RAID-volume without backup.

All data storage consists of both the primary storage and the backups. It's your money and your data, spend the storage budget wisely or pay with your data!

User avatar
drdope
Getting the hang of things
Posts: 82
Joined: Tue May 24, 2016 8:02 pm

Re: disk failue - spare disk does not automatically replace defective drive

Post by drdope » Tue Aug 04, 2020 6:01 pm

Never had a broken Disk in a QNAP, but a few on LSI/3Ware-, Areca-Raidcontrollers and Linux-/BSD-Softraids using MDADM.
A hotspare would always jump in, as soon as another Drive fails; if it doesn't and one has to manually unplug the faulty drive first, that's a bug in my opinion.
TS-677 - R7-1700/64GB/QM2-2P-384/2x 1TB 970 EVO (R1; System, VMs & SMB-Shares) & 2x 2TB WD Blue (R1; online-, ondevice-backup)
TS-669l, 3GB (2x 3TB WD30EFRX; Raid1; online-, offdevice-, onsite-backup)
TS-453A, 4GB (2x 3TB WD30EFRX; Raid1; online-, offdevice-, offsite-backup)
6x 2TB USB3.0 HDDs in daily rotation (offline, offdevice, offsite-backups)
"roughly 80% of storage related costs are generated by backups"

P3R
Guru
Posts: 12636
Joined: Sat Dec 29, 2007 1:39 am
Location: Stockholm, Sweden (UTC+01:00)

Re: disk failue - spare disk does not automatically replace defective drive

Post by P3R » Tue Aug 04, 2020 7:37 pm

drdope wrote:
Tue Aug 04, 2020 6:01 pm
A hotspare would always jump in, as soon as another Drive fails; if it doesn't and one has to manually unplug the faulty drive first, that's a bug in my opinion.
Maybe there is a bug well, I haven't used it enough to know, but I'd say that it isn't as black/white.

If a disk fail completely to the point that it disappear, the hot spare will of course take it's place in a Qnap as well. A problem though is that the definition of when a drive have "failed" isn't all that clear. Some admins think that a single Current pending sector mean that the disk have failed while others will run things to the bitter end. The hot spare logic have some triggers for when the hot spare to kick in but those can be different for different RAID solutions. The home and SMB users that traditionally have been the core Qnap customers (though they since long are working themselves into the enterprise segment as well) are probably much more forgiving than professional customers so the question is what triggers Qnap have? Maybe it's that Qnap want more indications of a failure (to not annoy their home/SMB customers) than other manufacturers, rather than it being a clear bug?
RAID have never ever been a replacement for backups. Without backups on a different system (preferably placed at another site), you will eventually lose data!

A non-RAID configuration (including RAID 0, which isn't really RAID) with a backup on a separate media protects your data far better than any RAID-volume without backup.

All data storage consists of both the primary storage and the backups. It's your money and your data, spend the storage budget wisely or pay with your data!

wikon-it
New here
Posts: 2
Joined: Tue Aug 04, 2020 2:09 pm

Re: disk failue - spare disk does not automatically replace defective drive

Post by wikon-it » Tue Aug 04, 2020 8:30 pm

P3R wrote:
Tue Aug 04, 2020 4:36 pm
wikon-it wrote:
Tue Aug 04, 2020 2:15 pm
What I expected is actually that this drive will automatically jump in for the broken disk as soon as the system detects a broken disk. But it seems to do nothing.
That is the experience many users with online spare disks have.
Is there any manual action needed to make the spare replace the broken drive?
In theory it shouldn't but now that you've noticed it doesn't work as expected, it's easily fixed.
And can the broken drive easily be replaced online by just pulling it out and inserting a new one and nothing bad will happen?
When you pull the disk, the spare will jump in unless there's a another problem.
Do I have to disable hot spare first when I replace the broken disk?
You have to decide what it is you want to do.
Either you pull the faulty disk and allow the spare to take it's place...
OR
...you disable the spare disk and replace the faulty drive manually.
I read that hot spare possibly only gets active, when a disk is pulled out...
Of course that's not the way it's intended but it's often the real life experience.
Ok, thanks a lot for confirming that my thoughts on that were not completely weird. I think I will go ahead, disable the hot spare function because when I have to drive to the Datacenter where the SAN is located in order to pull out the defective disk so the spare disk jumps in, I can as well drive there and replace the broken disk. So in this case, hot spare disks are not really useful (maybe only if you don't have another spare disk and have the SAN standing next to you anyway).

mwyatt@fseinc.net
New here
Posts: 2
Joined: Fri Apr 23, 2021 3:49 am

Re: disk failue - spare disk does not automatically replace defective drive

Post by mwyatt@fseinc.net » Fri Apr 23, 2021 4:17 am

I know what you mean. With my TVS-1271U-RP I've had two distinct experiences with this. One time a drive began failing the SMART and eventually failed altogether. The hot spare did not kick in. Another time one of my RAID members just "went dark" essentially disappearing from the QNAP, requiring a NAS reboot. The hot spare did not kick in. In both serious events it just ignored the problem and kept doing nothing but spin.

QNAP doesn't instruct to "pull the disk" as that defeats the purpose of a hot spare.

In fact, QNAP documentation states "A Hot Spare Disk is used as an extra protection against data loss: when a disk fails the hot spare disk will automatically replace the faulty disk. If there are no failing disks, the spare disk remains unused and does not store any data." https://www.qnap.com/en/how-to/knowledg ... spare-disk

This appears to be a glaring oversight by QNAP that they need to fix with a future firmware, although my NAS has been in service since 2018 and no firmware has ever resolved this.

mwyatt@fseinc.net
New here
Posts: 2
Joined: Fri Apr 23, 2021 3:49 am

Re: disk failue - spare disk does not automatically replace defective drive

Post by mwyatt@fseinc.net » Fri Apr 23, 2021 11:07 pm

Well I've just been proved wrong. I had a drive failure on this same unit and the hot-spare (disk 12) actually replaced the failed drive (drive 5). Yay! It's never done that before, so maybe it heard me ranting about it...
You do not have the required permissions to view the files attached to this post.

Post Reply

Return to “System & Disk Volume Management”