853 pro / QTS 4.2 - How do you enable BBM configured with RAID 5.
From Qnap 853 Pro User Manual:
Bad Block Management (BBM)
BBM uses the bad block list (log) for each drive and uses it to allow the system to fail single blocks rather than entire drives. This is especially useful for RAID arrays. Bad blocks in different places on different drives can leave a RAID array that still has at least single redundancy on all stripes. With this option, the RAID array can be functional even when encountering these issues when the RAID is rebuilding.
Please note that BBM is only supported in RAID 5 and RAID 6.
Bad Block Management (BBM) - How to enable?
-
- First post
- Posts: 1
- Joined: Sun Dec 13, 2015 12:25 am
- schumaku
- Guru
- Posts: 43571
- Joined: Mon Jan 21, 2008 4:41 pm
- Location: Kloten (Zurich), Switzerland -- Skype: schumaku
- Contact:
Re: Bad Block Management (BBM) - How to enable?
Any reference on how this BBM should work in a world of smart SATA and SAS disks, locally taking care of sectors with read- and write-issues?
Every modern SATA/SAS HDD and SSD does work like that. Blocks with real, non-fixable issues are replaced form a spare block set in a reserved space - it's a perfect transparent process. Once an SAS or SATA device does start to create bad blocks many things went badly wrong long before - the disk is overdue, and needs replacement anyway.
I'm aware of BM when it comes to "dumb" block storage devices like flash - but not for modern drive architectures.
Every modern SATA/SAS HDD and SSD does work like that. Blocks with real, non-fixable issues are replaced form a spare block set in a reserved space - it's a perfect transparent process. Once an SAS or SATA device does start to create bad blocks many things went badly wrong long before - the disk is overdue, and needs replacement anyway.
I'm aware of BM when it comes to "dumb" block storage devices like flash - but not for modern drive architectures.
-
- New here
- Posts: 8
- Joined: Sun Apr 13, 2014 1:39 pm
Re: Bad Block Management (BBM) - How to enable?
BBM appears to be QNAP's name for mdraid's "bad block log". Without a bad block log, when any block on a device fails during RAID operations, that device is booted out of the array. If the array was already degraded with no further redundancy, this results in complete loss of data on the array without very time-intensive procedures to recover things. With a bad block log enabled, mdraid keeps track of block failures, and if a block ever fails when the array has no redundancy, it knows that only that block is bad. This makes array rebuilds (esp. on RAID 5) much safer because at that point the array is already degraded, and without the bad block log, any read failure during the rebuild results in complete failure of the array; with the bad block log, you'll only lose files affected by the bad block log. Of course, you should have backups, but restoring a few files from backup is much faster than restoring the entire array.
schumaku, you ask about how this interacts with modern disks that take care of sectors with read/write issues. Typically, if these devices experience an error during WRITE (including with checking the data was properly written), they will immediately re-map that sector to the spare set. Similarly, I believe they often remap sectors that have previously experienced a read error the next time they are written (though it's possible they don't if the subsequent write/check succeeds?). However, at the time of a read error, there's no way to get the data for that sector. The bad block log allows mdraid to act like a normal disk and just return a "**, I can't read that" to the OS (or, if you have redundancy, it'll grab it from another disk and rewrite it to the failing disk, causing the sectors to be remapped), vs failing the device out of the array entirely.
The bad block log is a somewhat (maybe 2014 in the mainline kernel? I forget) recent feature, and wasn't available in qnap's original builds. It's now supported at least at the kernel/mdraid level in all qnap firmwares I think. However, generally you create a RAID array with 'mdadm', and the version of mdadm that qnap ships in a number of the firmware builds doesn't support the bad block log. If your NAS ships with a supported version of mdadm, I think the default is supposed to be to create all arrays with a bad block log, though many of ours here have arrays that were set up before this feature was added, and I haven't seen any indication of them being automatically upgraded.
I don't know offhand of any ways to check bad block log presence from the QNAP web interface. However, if you are willing to log in via ssh, you can do the following:
1. Figure out the name of your data RAID device by running Typically, it will be `/dev/md0`, but your configuration may vary.
2. will return 1 if the bad block log is enabled, and 0 otherwise.
If your system has an mdadm that supports the bad block log, you can also use (or substitute another relevant data partition in place of sda3) and it will have a line that shows something like "Bad Block Log : 512 entries available at offset -8 sectors". On my systems, some have version (`mdadm --version`) 2.6.x, which doesn't support it, while some have version 3.3, which does. I'm not sure exactly where the cutoff was, but if you scp the mdadm to another unix machine with `strings` installed, and run you'll get some output if your mdadm supports the bad block log.
If you want to enable the bad block log on a RAID device for which it's not currently enabled, the procedure (again via an ssh login) is roughly the following:
0. This requires a copy of mdadm that supports the bad block log. I use one that I ended up building from source with QDK or something, but you might also be able to copy one from a newer firmware or something? I did this a long time ago, so I don't actually remember how to do it.
1. stop services using the data device. Sometimes it can be tricky to get them all; usually at least the file servers, and my NASes seem to usually have mysqld and mytranscodesvr running. Sometimes I stop everything and still just have to try like 6 times (often rebooting in between) to get things to work. Starting some operation from the gui that requires services to be stopped might get the gui to do all the hard work for you here, but might leave you in a state that doesn't work for other reasons?
2. to unmount the filesystem. You may have to return to step 1 to find more missing items. Obviously use *your* data device if it's not `/dev/md0` (that's going to apply throughout, so I'm not going to keep repeating it).
2a. Some newer QNAP setups use lvm to do fancy thin volume management on top of mdraid; if your system does this, you'll need to stop that, which is something vaguely like Though the exact commands may vary
3. to stop the RAID
4. to re-start the RAID and add bad block logs to all devices. Note that I've stored my custom-built, upgraded version of mdadm at /mnt/HDA_ROOT; if your mdadm supports this out of the box, you can just use `mdadm` instead of the full path. Also note that here I'm putting together `/dev/md0` from 6 partitions (`/dev/sda3`, etc) which may not be exactly your operation.
5. The easiest way to get things back into normal operating state (services running and such) is probably to reboot. The bad block log should stick around, though I think I've seen that it doesn't always get added to new disks when you swap them out, so you may have to repeat this after you add disks.
schumaku, you ask about how this interacts with modern disks that take care of sectors with read/write issues. Typically, if these devices experience an error during WRITE (including with checking the data was properly written), they will immediately re-map that sector to the spare set. Similarly, I believe they often remap sectors that have previously experienced a read error the next time they are written (though it's possible they don't if the subsequent write/check succeeds?). However, at the time of a read error, there's no way to get the data for that sector. The bad block log allows mdraid to act like a normal disk and just return a "**, I can't read that" to the OS (or, if you have redundancy, it'll grab it from another disk and rewrite it to the failing disk, causing the sectors to be remapped), vs failing the device out of the array entirely.
The bad block log is a somewhat (maybe 2014 in the mainline kernel? I forget) recent feature, and wasn't available in qnap's original builds. It's now supported at least at the kernel/mdraid level in all qnap firmwares I think. However, generally you create a RAID array with 'mdadm', and the version of mdadm that qnap ships in a number of the firmware builds doesn't support the bad block log. If your NAS ships with a supported version of mdadm, I think the default is supposed to be to create all arrays with a bad block log, though many of ours here have arrays that were set up before this feature was added, and I haven't seen any indication of them being automatically upgraded.
I don't know offhand of any ways to check bad block log presence from the QNAP web interface. However, if you are willing to log in via ssh, you can do the following:
1. Figure out the name of your data RAID device by running
Code: Select all
cat /proc/mdstat
2.
Code: Select all
cat /sys/block/md0/md/qnap_bbm
If your system has an mdadm that supports the bad block log, you can also use
Code: Select all
mdadm --examine /dev/sda3
Code: Select all
strings /tmp/mdadm | grep bbl
If you want to enable the bad block log on a RAID device for which it's not currently enabled, the procedure (again via an ssh login) is roughly the following:
0. This requires a copy of mdadm that supports the bad block log. I use one that I ended up building from source with QDK or something, but you might also be able to copy one from a newer firmware or something? I did this a long time ago, so I don't actually remember how to do it.
1. stop services using the data device. Sometimes it can be tricky to get them all; usually at least the file servers, and my NASes seem to usually have mysqld and mytranscodesvr running. Sometimes I stop everything and still just have to try like 6 times (often rebooting in between) to get things to work. Starting some operation from the gui that requires services to be stopped might get the gui to do all the hard work for you here, but might leave you in a state that doesn't work for other reasons?
2.
Code: Select all
umount /dev/md0
2a. Some newer QNAP setups use lvm to do fancy thin volume management on top of mdraid; if your system does this, you'll need to stop that, which is something vaguely like
Code: Select all
dmsetup ls
dmsetup remove cachedev1
vgchange -an vg1
3.
Code: Select all
mdadm --stop /dev/md0
4.
Code: Select all
/mnt/HDA_ROOT/mdadm --assemble --update=bbl /dev/md0 /dev/sd[abcdef]3
5. The easiest way to get things back into normal operating state (services running and such) is probably to reboot. The bad block log should stick around, though I think I've seen that it doesn't always get added to new disks when you swap them out, so you may have to repeat this after you add disks.
Last edited by bkurtz on Sat Jun 03, 2017 2:46 am, edited 1 time in total.
- schumaku
- Guru
- Posts: 43571
- Joined: Mon Jan 21, 2008 4:41 pm
- Location: Kloten (Zurich), Switzerland -- Skype: schumaku
- Contact:
Re: Bad Block Management (BBM) - How to enable?
Worth adding -> What is BBM (Bad Block Management)?
The applied models can have changed and extended in the meantime.Applied Models:
Enterprise: TDS-X89, TES-X85, TVS-X80, TS-X80, SS-X79
SMB: TVS-X82, TVS-X71, TVS-X73, TVS-X63, TS-X63, TS-X53, TS-X53A, TS-X51, TS-X51A
Answer:
BBM keeps a log of bad blocks for each drive to allow the system track each bad block directly without effect whole disk. BBM is particularly useful for RAID, as it allows them to retain redundancy of bad blocks on all stripes. With this feature, a RAID can still remain functional even when the RAID is rebuilding.
* Please note that BBM only supports RAID 5 or RAID 6 that were created with firmware 4.2.0 (or newer). RAID created with an older firmware will not support BBM even if the firmware is updated.
Release date: 2016-10-14
-
- New here
- Posts: 8
- Joined: Sun Apr 13, 2014 1:39 pm
Re: Bad Block Management (BBM) - How to enable?
Also worth noting: on models that don't have newer mdadm (looks to me like the distinction is HOME vs SMB models; I guess they don't officially support BBM from qnap), mdadm may show scary-ish output when you use `mdadm --examine` - mine says "378 failed" which is obviously false. `cat /proc/mdstat` shows that everything is fine, as does newer mdadm.