Constant Disk Activity - Probably Explained

Discussion about hard drive spin down (standby) feature of NAS.
ChuckDavis666
Starting out
Posts: 40
Joined: Mon Jan 16, 2017 11:33 pm

Constant Disk Activity - Probably Explained

Post by ChuckDavis666 »

The good news is that I have a defective hard drive. The better news is that while investigating a particularly noisy TS-451A I learned what, for some systems at least, is probably causing the reports of constant disk activity on the QNAP forum (and forums for other brands of NAS also).

Most people describe the constant disk activity as a constant pattern of "clunks and thunks." In my case the NAS actually "chatters" constantly. Initially I chalked this up to HGST drives being known to be noisy, but I eventually determined that one drive is defective. Ironically the "chatter" helped lead me to identify the cause of the constant disk activity. Pulling the bad drive "quiets" the NAS to just "clunks and thunks."

Through testing I identified three factors that will keep the drives from spinning down. (See viewtopic.php?f=55&t=131538) While doing this testing I noticed something unusual, however. The disk drives would spin down even though the disks were "chattering" away with constant activity. Apparently whatever was causing the constant disk activity was not visible to QTS. IOPS in Control Panel |System Status | Disk Performance was also zero. This raised the suspicion that the cause of the constant disk activity was external to QTS and might be intrinsic to the drives themselves.

While researching this possibility I came across a feature of newer consumer-grade disk drives called "Background Medium Scan" ("BMS" or sometimes "BGMS"). As described in an online article:

** The BMS process works at idle time, when the disk received no commands, a common setting is to perform the BMS after about 500ms of not getting any command. The disk may be able to perform this work in a loaded system but it will take much longer, if the disk senses it really needs to perform background tasks it may reduce the required idle time to gain a higher chance of doing it, in that case it may use idle times of around 100ms and then go off to do its background tasks. For the BMS the disk goes sequentially over the media, reading the data but with a slightly reduced ECC tolerance so that it can find locations that are having problems but are still readable. If it hits a problem spot it has two options, if it can recover the data with the full ECC it will do so and attempt a rewrite to the same spot, it will also verify that it can re-read the data and if not it will declare the sector bad and perform a reallocation. If it cannot read the data at all it has no real recourse, it will mark the sector as needing reallocation and if the next access to this location will be a write rather than a read the sector will be reallocated.

Another article points out the benefits of BMS/BGMS:

** First, BGMS will fix bad blocks on-the-fly as they are discovered by the firmware. The disk drive will use idle time to perform multiple re-reads to correct the data. As the bad blocks are discovered BEFORE the O/S actually needs the data on those blocks, then no programs have to suspend processing while bad blocks are repaired. If your host is streaming movies into hotel rooms, then user's won't suffer through the experience of a movie stopping for 5-30 seconds while the host and/or RAID subsystem go through the data recovery/remapping process.

** If you are using software RAID, then BGMS can somewhat replace data consistency checks, and provide somewhat self-healing storage farms.

** By exploiting the power of BGMS, you could effectively scan and repair any size storage farm 24x7 without the inherent overhead when the host tries to scan & repair bad blocks via brute-force techniques.

To recap: Basically BMS is a continuous surface error scan that is built into the firmware of the drive. The surface scan (re)starts after several hundred milliseconds of inactivity and then stops whenever QTS does a read or write. It was originally patented by Seagate for enterprise-class drives but is appearing in other brands/models of drives in the consumer space. (I.e. HGST 5 TB and 6 TB and some WD drives.)

I communicated with HGST/WD and received the following information:

** Confirmation that my HGST 5 TB drives are doing BMS

** In addition to doing surface scanning during periods of inactivity, the drives are also doing Preventative Wear Leveling (PWL) that keeps the head from staying in one place too long. (I suspect that this might be the source of the "thunks and clunks.")

** They don't know the total amount of time it might take to complete a Background Medium Scan on a 5 TB drive during periods of inactivity, but acknowledged it could take "quite a while."

** When a drive does finish a complete Background Medium Scan the drive then waits 168 hours before starting another scan. (Since the drives in a NAS start and stop the Background Medium Scan independently based on each drive's inactivity, I imagine that in a NAS with two or more drives it would be very likely that the 168 hour timout periods would not overlap and that at least one drive would be performing a scan at any time.)

** The Background Medium Scan restarts where it left off if it gets interrupted by a power cycle.

Note that constant disk activity and disk drives not spinning down are separate issues. QTS cannot see the BMS disk activity and will spin down the drives if QTS is able to remain inactive long enough to trigger the inactivity timer. BMS does not keep the drives from spinning down.

My speculation is that if you put drives that are more or less constantly performing a Background Medium Scan (and Preventative Wear Leveling) in a plastic, flimsy, resonant NAS chassis that you can end up with a real noise machine.

The good news is that the disk activity for the Background Medium Scan is accounted for in the MTBF of the drives. The bad news is that those of us that have drives with BMS/PWL and that wanted to put our NAS in an office, living room or sleeping space will probably have to move them elsewhere.

The Background Medium Scan can be monitored and controlled with SMARTmon.

http://www.santools.com/smart/unix/manu ... nction.htm

Unfortunately there is no longer a version of SMARTmon in the QNAP AppCenter. There is a package of utilities that apparently includes SMARTmon functionality, but it is far beyond my *NIX skills.

viewtopic.php?f=320&t=100843&hilit=Entware+x86

Turning off Background Medium Scan would confirm whether or not BMS is the source of the constant disk activity and noise. If desired I think that BMS could be left turned off, solving this issue, since it is somewhat redundant to RAID. I would be very interested to know the results if somebody could get SMARTmon running on a "noisy" system.
P3R
Guru
Posts: 13192
Joined: Sat Dec 29, 2007 1:39 am
Location: Stockholm, Sweden (UTC+01:00)

Re: Constant Disk Activity - Probably Explained

Post by P3R »

ChuckDavis666 wrote:If desired I think that BMS could be left turned off, solving this issue, since it is somewhat redundant to RAID.
I much more see BMS as complemental to RAID. In what way do you mean it is redundant?
RAID have never ever been a replacement for backups. Without backups on a different system (preferably placed at another site), you will eventually lose data!

A non-RAID configuration (including RAID 0, which isn't really RAID) with a backup on a separate media protects your data far better than any RAID-volume without backup.

All data storage consists of both the primary storage and the backups. It's your money and your data, spend the storage budget wisely or pay with your data!
User avatar
schumaku
Guru
Posts: 43578
Joined: Mon Jan 21, 2008 4:41 pm
Location: Kloten (Zurich), Switzerland -- Skype: schumaku
Contact:

Re: Constant Disk Activity - Probably Explained

Post by schumaku »

It's a valid discovery of course - great find, good documentation!

BMS might be considered somewhat redundant to the (in general strongly suggested) regular scheduling of SMART short and full tests. It's about the integrity and reliability of each HDD. And HDD are core for the RAID and NAS operations overall.
ChuckDavis666
Starting out
Posts: 40
Joined: Mon Jan 16, 2017 11:33 pm

Re: Constant Disk Activity - Probably Explained

Post by ChuckDavis666 »

P3R - I agree that BMS is complementary/additive to RAID. It adds a firmware-based preemptive layer of error detection/correction before RAID even becomes aware of a problem. Perhaps "somewhat redundant" wasn't the best wording. If BMS can be turned off, and if a quiet NAS is more important than this extra protection, and if occasional system pauses for RAID repair are acceptable then the NAS will still have "plain" RAID protection.
firemonkey
First post
Posts: 1
Joined: Sat Aug 19, 2017 9:59 pm

Re: Constant Disk Activity - Probably Explained

Post by firemonkey »

Thanks so much for looking into this. It's such a prominent noise you would think this would be more often mentioned in instruction manuals etc.

I have found the behaviour you describe in the HGST 10TB HE10 Helium. I have five of them so knew it was unlikely to be a fault. I contacted support but they have yet to get back to me. Your post allowed me to search the OEM manual for this.

Oddly, the SAS version of the drive calls it Background Medium Scan: https://www.hgst.com/sites/default/file ... c_r1.8.pdf
The SATA version seems to call it Off-line Read Scanning. https://www.hgst.com/sites/default/file ... c_r1.6.pdf

The manual for the SATA drive in particular seems to make very little mention of it. I guess they are roughly the same feature. Its nice to know that the drive is doing something genuinely useful, given the amount of noise it is making. I'll post any useful info that might come back from G-Technology (HGST) support, though I doubt they will offer anything.
User avatar
jameshenderson
Getting the hang of things
Posts: 62
Joined: Sat Jul 09, 2011 9:48 am
Location: Oxford, England
Contact:

Re: Constant Disk Activity - Probably Explained

Post by jameshenderson »

Hi - this is fantastic information!

I have a (new) TS-453Bmini with 4x4TB HGST drives. It is driving me to distraction with it's loud constant disk activity (my old TS-410 is very quiet in comparison - it has 4x2TB Seagate Barracuda drives which are very quiet and does not have constant disk access). I can hear 2 types of noise - a higher pitched "traditional" disk access noise (like on my old TS-410) and a much louder lower-pitched noise that doesn't seem right to me.

I am using a Mac - can someone share how to temporarily switch off (and on) the BMS in order to con form that this is what is happening? ...I took a look at the instructions but am not educated enough to interpret them.

thanks,
James.
TS-453Bmini + 4x 4TB Western Digital Reds (RAID5) - Plex Media Server
User avatar
jameshenderson
Getting the hang of things
Posts: 62
Joined: Sat Jul 09, 2011 9:48 am
Location: Oxford, England
Contact:

Re: Constant Disk Activity - Probably Explained

Post by jameshenderson »

ChuckDavis666 wrote:The good news is that I have a defective hard drive. The better news is that while investigating a particularly noisy TS-451A...
What was the cause of the defect? I ask because I have both the constant disk access (which you have explained - thanks!) and a loud low-pitch rumbling disk-access noise + fan always on.

It's a QNAP TS-453Bmini with 4x4 HGST 4TB drives which are ~48-40 degrees (which may explain the fan) but surely the fan is not designed to be on all the time?

I would have sent you a PM so not to confuse this thread, but I cannot find a button to do that. I also thought about starting a new thread, but no assurances you'd see it.

thanks,
James.
TS-453Bmini + 4x 4TB Western Digital Reds (RAID5) - Plex Media Server
P3R
Guru
Posts: 13192
Joined: Sat Dec 29, 2007 1:39 am
Location: Stockholm, Sweden (UTC+01:00)

Re: Constant Disk Activity - Probably Explained

Post by P3R »

jameshenderson wrote:It's a QNAP TS-453Bmini with 4x4 HGST 4TB drives which are ~48-40 degrees (which may explain the fan) but surely the fan is not designed to be on all the time?
I don't know about the TS-453B mini specifically but on most, if not all, other Qnaps the fan is on all the time. In a very small chassis like that with hot 7200 rpm disks I would definitely expect the fan to be working.
I would have sent you a PM so not to confuse this thread, but I cannot find a button to do that.
Unfotunately that have been disabled. Please complain to Qnap. Everyone wants it back.
RAID have never ever been a replacement for backups. Without backups on a different system (preferably placed at another site), you will eventually lose data!

A non-RAID configuration (including RAID 0, which isn't really RAID) with a backup on a separate media protects your data far better than any RAID-volume without backup.

All data storage consists of both the primary storage and the backups. It's your money and your data, spend the storage budget wisely or pay with your data!
User avatar
jameshenderson
Getting the hang of things
Posts: 62
Joined: Sat Jul 09, 2011 9:48 am
Location: Oxford, England
Contact:

Re: Constant Disk Activity - Probably Explained

Post by jameshenderson »

P3R wrote:I don't know about the TS-453B mini specifically but on most, if not all, other Qnaps the fan is on all the time. In a very small chassis like that with hot 7200 rpm disks I would definitely expect the fan to be working.
cool - no pun intended :-)
P3R wrote:
jameshenderson wrote:I would have sent you a PM so not to confuse this thread, but I cannot find a button to do that.
Unfotunately that have been disabled. Please complain to Qnap. Everyone wants it back.
will do - thanks.
TS-453Bmini + 4x 4TB Western Digital Reds (RAID5) - Plex Media Server
User avatar
Trexx
Ask me anything
Posts: 5388
Joined: Sat Oct 01, 2011 7:50 am
Location: Minnesota

Re: Constant Disk Activity - Probably Explained

Post by Trexx »

Entware-3x does have a smartmontools OPKG available. I wonder if you would be able to leverage that to disable the BMS if desired.
Paul

Model: TS-877-1600 FW: 4.5.3.x
QTS (SSD): [RAID-1] 2 x 1TB WD Blue m.2's
Data (HDD): [RAID-5] 6 x 3TB HGST DeskStar
VMs (SSD): [RAID-1] 2 x1TB SK Hynix Gold
Ext. (HDD): TR-004 [Raid-5] 4 x 4TB HGST Ultastor
RAM: Kingston HyperX Fury 64GB DDR4-2666
UPS: CP AVR1350

Model:TVS-673 32GB & TS-228a Offline[/color]
-----------------------------------------------------------------------------------------------------------------------------------------
2018 Plex NAS Compatibility Guide | QNAP Plex FAQ | Moogle's QNAP Faq
prrovoss
New here
Posts: 2
Joined: Wed Nov 08, 2017 8:06 pm

Re: Constant Disk Activity - Probably Explained

Post by prrovoss »

ive installed smartmontools via Qnapware, the help looks like this:

Code: Select all

[~] # smartctl -h
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.2.8] (localbuild)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

Usage: smartctl [options] device

============================================ SHOW INFORMATION OPTIONS =====

  -h, --help, --usage
         Display this help and exit

  -V, --version, --copyright, --license
         Print license, copyright, and version information and exit

  -i, --info
         Show identity information for device

  --identify[=[w][nvb]]
         Show words and bits from IDENTIFY DEVICE data                (ATA)

  -g NAME, --get=NAME
        Get device setting: all, aam, apm, lookahead, security, wcache, rcache, wcreorder

  -a, --all
         Show all SMART information for device

  -x, --xall
         Show all information for device

  --scan
         Scan for devices

  --scan-open
         Scan for devices and try to open each device

================================== SMARTCTL RUN-TIME BEHAVIOR OPTIONS =====

  -q TYPE, --quietmode=TYPE                                           (ATA)
         Set smartctl quiet mode to one of: errorsonly, silent, noserial

  -d TYPE, --device=TYPE
         Specify device type to one of: ata, scsi, sat[,auto][,N][+TYPE], usbcypress[,X], usbjmicron[,p][,x][,N], usbsunplus, marvell, areca,N/E, 3ware,N, hpt,L/M/N, megaraid,N, aacraid,H,L,ID, cciss,N, auto, test

  -T TYPE, --tolerance=TYPE                                           (ATA)
         Tolerance: normal, conservative, permissive, verypermissive

  -b TYPE, --badsum=TYPE                                              (ATA)
         Set action on bad checksum to one of: warn, exit, ignore

  -r TYPE, --report=TYPE
         Report transactions (see man page)

  -n MODE, --nocheck=MODE                                             (ATA)
         No check if: never, sleep, standby, idle (see man page)

============================== DEVICE FEATURE ENABLE/DISABLE COMMANDS =====

  -s VALUE, --smart=VALUE
        Enable/disable SMART on device (on/off)

  -o VALUE, --offlineauto=VALUE                                       (ATA)
        Enable/disable automatic offline testing on device (on/off)

  -S VALUE, --saveauto=VALUE                                          (ATA)
        Enable/disable Attribute autosave on device (on/off)

  -s NAME[,VALUE], --set=NAME[,VALUE]
        Enable/disable/change device setting: aam,[N|off], apm,[N|off],
        lookahead,[on|off], security-freeze, standby,[N|off|now],
        wcache,[on|off], rcache,[on|off], wcreorder,[on|off]

======================================= READ AND DISPLAY DATA OPTIONS =====

  -H, --health
        Show device SMART health status

  -c, --capabilities                                                  (ATA)
        Show device SMART capabilities

  -A, --attributes
        Show device SMART vendor-specific Attributes and values

  -f FORMAT, --format=FORMAT                                          (ATA)
        Set output format for attributes: old, brief, hex[,id|val]

  -l TYPE, --log=TYPE
        Show device log. TYPE: error, selftest, selective, directory[,g|s],
                               xerror[,N][,error], xselftest[,N][,selftest],
                               background, sasphy[,reset], sataphy[,reset],
                               scttemp[sts,hist], scttempint,N[,p],
                               scterc[,N,M], devstat[,N], ssd,
                               gplog,N[,RANGE], smartlog,N[,RANGE]

  -v N,OPTION , --vendorattribute=N,OPTION                            (ATA)
        Set display OPTION for vendor Attribute N (see man page)

  -F TYPE, --firmwarebug=TYPE                                         (ATA)
        Use firmware bug workaround:
        none, nologdir, samsung, samsung2, samsung3, xerrorlba, swapid

  -P TYPE, --presets=TYPE                                             (ATA)
        Drive-specific presets: use, ignore, show, showall

  -B [+]FILE, --drivedb=[+]FILE                                       (ATA)
        Read and replace [add] drive database from FILE
        [default is +/Apps/opt/etc/smart_drivedb.h
         and then    /Apps/opt/share/smartmontools/drivedb.h]

============================================ DEVICE SELF-TEST OPTIONS =====

  -t TEST, --test=TEST
        Run test. TEST: offline, short, long, conveyance, force, vendor,N,
                        select,M-N, pending,N, afterselect,[on|off]

  -C, --captive
        Do test in captive mode (along with -t)

  -X, --abort
        Abort any non-captive test on device

=================================================== SMARTCTL EXAMPLES =====

  smartctl --all /dev/sda                    (Prints all SMART information)

  smartctl --smart=on --offlineauto=on --saveauto=on /dev/sda
                                              (Enables SMART on first disk)

  smartctl --test=long /dev/sda          (Executes extended disk self-test)

  smartctl --attributes --log=selftest --quietmode=errorsonly /dev/sda
                                      (Prints Self-Test & Attribute errors)
  smartctl --all --device=3ware,2 /dev/sda
  smartctl --all --device=3ware,2 /dev/twe0
  smartctl --all --device=3ware,2 /dev/twa0
  smartctl --all --device=3ware,2 /dev/twl0
          (Prints all SMART info for 3rd ATA disk on 3ware RAID controller)
  smartctl --all --device=hpt,1/1/3 /dev/sda
          (Prints all SMART info for the SATA disk attached to the 3rd PMPort
           of the 1st channel on the 1st HighPoint RAID controller)
  smartctl --all --device=areca,3/1 /dev/sg2
          (Prints all SMART info for 3rd ATA disk of the 1st enclosure
           on Areca RAID controller)
i cant seem to find anything that would disable bgms
ChuckDavis666
Starting out
Posts: 40
Joined: Mon Jan 16, 2017 11:33 pm

Re: Constant Disk Activity - Probably Explained

Post by ChuckDavis666 »

User avatar
Trexx
Ask me anything
Posts: 5388
Joined: Sat Oct 01, 2011 7:50 am
Location: Minnesota

Re: Constant Disk Activity - Probably Explained

Post by Trexx »

That appears to be a commercial product, so not sure that will come to QNAP anytime soon.
Paul

Model: TS-877-1600 FW: 4.5.3.x
QTS (SSD): [RAID-1] 2 x 1TB WD Blue m.2's
Data (HDD): [RAID-5] 6 x 3TB HGST DeskStar
VMs (SSD): [RAID-1] 2 x1TB SK Hynix Gold
Ext. (HDD): TR-004 [Raid-5] 4 x 4TB HGST Ultastor
RAM: Kingston HyperX Fury 64GB DDR4-2666
UPS: CP AVR1350

Model:TVS-673 32GB & TS-228a Offline[/color]
-----------------------------------------------------------------------------------------------------------------------------------------
2018 Plex NAS Compatibility Guide | QNAP Plex FAQ | Moogle's QNAP Faq
ChuckDavis666
Starting out
Posts: 40
Joined: Mon Jan 16, 2017 11:33 pm

Re: Constant Disk Activity - Probably Explained

Post by ChuckDavis666 »

Thought maybe the command line arguments would be same or similar.
prrovoss
New here
Posts: 2
Joined: Wed Nov 08, 2017 8:06 pm

Re: Constant Disk Activity - Probably Explained

Post by prrovoss »

ChuckDavis666 wrote:Thought maybe the command line arguments would be same or similar.
they are not working, i tried that.
Locked

Return to “HDD Spin Down (HDD Standby)”