Disk error after Data Scrubbing

Questions about SNMP, Power, System, Logs, disk, & RAID.
Locked
MikeSpragg
Starting out
Posts: 18
Joined: Sat Oct 12, 2013 5:35 pm

Re: [WARNING!!] New feature in QTS (Data Scrubbing) could be destructive!

Post by MikeSpragg »

I can verify this as well as I just did a search for this. I bought two brand new QNAP TVS-EC880's. Both were fitted with 4x4TB WD RED's in R6. I've had no end of issues with these units - each week I'd get at least 1 drive fail (Current_Pending_Sector is increasing in SMART). What was unusual is that 1 drive would fail per week (which I'd return and the supplier would send another). I was thinking "bad batch". However, yesterday - **THREE** drives went bad at exactly the same time (same error on each) and all via RAID data scrubbing. I noticed it was at the end of what QNAP now change in v4.3.3 of QTS to a weekly scrub. QNAP+RAID+DATA+SCRUBBING brought me to this post !

Type Date Time Users Source IP Computer name Content
Warning 2017/06/12 06:21:10 System 127.0.0.1 localhost Host: Disk 2 Read I/O error, UNRECOVERED READ ERROR sense_key=0x3, asc=0x11, ascq=0x4, CDB=88 00 00 00 00 01 d0 b1 33 d8 00 00 05 40 00 00 .
Type Date Time Users Source IP Computer name Content
Warning 2017/06/12 06:21:11 System 127.0.0.1 localhost Host: Disk 2 medium error. Please run a bad block scan on this drive or replace it if the error persists.

The above is what you get in the QNAP log, with a recommendation to run a bad block check (which doesn't reveal anything). I'm more likely to think this is QNAP data scrub issue than a WD Red issue (as I've got other units running years with WD red's and no faults.

The R6 volume remains intact but the drives are now buggered !

PS You can turn off this feature by running Disk Manager, select the Cog/Edit (Global Setting) on the top of the window and turn off data scrubbing in the bottom option.
P3R
Guru
Posts: 13190
Joined: Sat Dec 29, 2007 1:39 am
Location: Stockholm, Sweden (UTC+01:00)

Re: [WARNING!!] New feature in QTS (Data Scrubbing) could be destructive!

Post by P3R »

MikeSpragg wrote:...each week I'd get at least 1 drive fail (Current_Pending_Sector is increasing in SMART).
Since that information comes from the disk itself it's the questionable batch of disks that's your problem, not the scrubbing. The scrubbing only revealed the problems, instead of you finding that out later when doing an actual RAID rebuild and already being one disk down when starting.

Did you buy all disks from the same source at the same time?
RAID have never ever been a replacement for backups. Without backups on a different system (preferably placed at another site), you will eventually lose data!

A non-RAID configuration (including RAID 0, which isn't really RAID) with a backup on a separate media protects your data far better than any RAID-volume without backup.

All data storage consists of both the primary storage and the backups. It's your money and your data, spend the storage budget wisely or pay with your data!
MikeSpragg
Starting out
Posts: 18
Joined: Sat Oct 12, 2013 5:35 pm

Re: [WARNING!!] New feature in QTS (Data Scrubbing) could be destructive!

Post by MikeSpragg »

Yes, I did originally. However, I have replaced them with some spare drives I had (Hitachi) and the same problem occurred. The vendor also sent me replacements (for the ones that failed) and these were [dated] Oct'16 and Jan'17 of manufacture - and they failed too.

PS I have to have a chuckle at your signature. The 1st QNAP is a primary volume (used by VMWare for disk storage) and the 2nd QNAP is the slave with the intent to backup master to slave using Naviko. Funny how life throws these curve balls from time to time !
sunnyl
Starting out
Posts: 28
Joined: Sat Oct 22, 2016 11:07 am

Re: [WARNING!!] New feature in QTS (Data Scrubbing) could be destructive!

Post by sunnyl »

Are there any logs of the scrubbing? Noticed a lot of activity on my NAS yesterday (Sunday night) and logged in today to find the scrubbing job had run, then found this thread (NAS is still fully functional though as far as I can tell)

What I didn't really understand though was that the scrubbing job started at 00:00:00, but a resync job started 00:00:00 and finished 5 hours later. My understanding of RAID is that a resync is only done eg when data is needs to be shifted, for example when a disk is replaced. Does that mean the scrubbing job found something was wrong?
You do not have the required permissions to view the files attached to this post.
User avatar
dolbyman
Guru
Posts: 35020
Joined: Sat Feb 12, 2011 2:11 am
Location: Vancouver BC , Canada

Re: [WARNING!!] New feature in QTS (Data Scrubbing) could be destructive!

Post by dolbyman »

sunnyl wrote:Are there any logs of the scrubbing? Noticed a lot of activity on my NAS yesterday (Sunday night) and logged in today to find the scrubbing job had run, then found this thread (NAS is still fully functional though as far as I can tell)

What I didn't really understand though was that the scrubbing job started at 00:00:00, but a resync job started 00:00:00 and finished 5 hours later. My understanding of RAID is that a resync is only done eg when data is needs to be shifted, for example when a disk is replaced. Does that mean the scrubbing job found something was wrong?
check the smart status of your disks, any relocated sectors etc ?
User avatar
oyvindo
Experience counts
Posts: 1399
Joined: Tue May 19, 2009 2:08 am
Location: Norway, Oslo

Re: [WARNING!!] New feature in QTS (Data Scrubbing) could be destructive!

Post by oyvindo »

My understanding is that the resync job is actually the core part of the Data Scrubbing. I'm also seeking a detailed log of the scrubbing - but I haven't found one yet.
Anyway, I did a full test (7,5 hours on each drive) followed by a full bad block scan (5 hours on each drive), and guess what? ; NOTHING FOUND! NO ERRORS!
I have also checked the S.M.A.R.T status and there are no relocated sectors on any of the drives!
So what this Data Scrubbing thing has found that caused it to take my RAID offline - is now becoming a mystery....
ImageImageImage
P3R
Guru
Posts: 13190
Joined: Sat Dec 29, 2007 1:39 am
Location: Stockholm, Sweden (UTC+01:00)

Re: [WARNING!!] New feature in QTS (Data Scrubbing) could be destructive!

Post by P3R »

oyvindo wrote:My understanding is that the resync job is actually the core part of the Data Scrubbing.
As far as I know resync and repair (data scrubbing) are different but very similar commands but it seems like they're both reported as resync in the logs.
So what this Data Scrubbing thing has found that caused it to take my RAID offline - is now becoming a mystery....
You said it yourself when you opened the thread: Disk Access History (I/O) Abnormal. To me that sounds like the NAS had problems communicating with disks. The problem could be with either the disks or the NAS.

As I said previously in the thread, there are other disk problems than failing sectors.

The data scrubbing revealed or accelerated the problem and QTS took the RAID offline when it lost too many disks.
RAID have never ever been a replacement for backups. Without backups on a different system (preferably placed at another site), you will eventually lose data!

A non-RAID configuration (including RAID 0, which isn't really RAID) with a backup on a separate media protects your data far better than any RAID-volume without backup.

All data storage consists of both the primary storage and the backups. It's your money and your data, spend the storage budget wisely or pay with your data!
User avatar
OneCD
Guru
Posts: 12037
Joined: Sun Aug 21, 2016 10:48 am
Location: "... there, behind that sofa!"

Re: [WARNING!!] New feature in QTS (Data Scrubbing) could be destructive!

Post by OneCD »

sunnyl wrote:What I didn't really understand though was that the scrubbing job started at 00:00:00, but a resync job started 00:00:00 and finished 5 hours later. My understanding of RAID is that a resync is only done eg when data is needs to be shifted, for example when a disk is replaced.
RAID scrubbing = RAID resync.

When you replace a drive = RAID rebuild.

ImageImageImageImageImageImageImageImageImageImageImageImageImageImageImageImageImageImage
User avatar
oyvindo
Experience counts
Posts: 1399
Joined: Tue May 19, 2009 2:08 am
Location: Norway, Oslo

Re: [WARNING!!] New feature in QTS (Data Scrubbing) could be destructive!

Post by oyvindo »

I believe @OneCD is right here...
And I also believe that Abnormal Access History in no way is related to the SATA interface or any other HW matter, but rather means undefined bit (read) errors....
(But I admit I may be wrong) :roll:
ImageImageImage
User avatar
oyvindo
Experience counts
Posts: 1399
Joined: Tue May 19, 2009 2:08 am
Location: Norway, Oslo

Re: [WARNING!!] New feature in QTS (Data Scrubbing) could be destructive!

Post by oyvindo »

skypx wrote:I really thought you were using desktop drives since you didn't specify in your signature.
Even if I did, @P3R has a point, or ..... ?
ImageImageImage
QNAP_Daniel
Starting out
Posts: 49
Joined: Fri Sep 14, 2012 4:27 pm

Re: [WARNING!!] New feature in QTS (Data Scrubbing) could be destructive!

Post by QNAP_Daniel »

Hi all,

I think there is some misunderstanding and incorrect pointing of blame here.

The RAID scrubbing will resynchronise the data and parity blocks across the entire RAID array. This involves reading all blocks from every RAID stripe, then repairing any blocks which were unreadable or inconsistent. In this sense, it is effectively similar to doing a bad blocks scan on all disks in the array. Therefore, if any of the disks do have bad blocks, they will be guaranteed to show up during the data scrubbing.

This is the reason why a data scrub may cause disk errors to show up even if there were no such errors previously. In fact the same phenomenon often happens during RAID 5 rebuilds after a disk failure.
For example, the disk may have bad blocks in an area of the RAID which is unused or infrequently accessed. During normal operation there is no error from the disk, but during a data scrub there will be an I/O error if the bad block is unrecoverable. It should be noted that this is preferable to happen during a data scrub rather than during a degraded RAID rebuild. If it happens during a rebuild then there is a chance of data loss or data corruption (since there is no redundancy) - during a scrub there is no such risk.

In this way, the data scrub can alert you to bad disks which might otherwise cause your data loss during a RAID rebuild, as well as repairing silent data corruption of the RAID data and parity blocks due to faulty sectors. It is kind of like a "dry-run" RAID rebuild.
ensignvorik
Easy as a breeze
Posts: 364
Joined: Sat Jul 14, 2012 8:24 pm

Re: [WARNING!!] New feature in QTS (Data Scrubbing) could be destructive!

Post by ensignvorik »

Good reply.

Does anyone else think doing this weekly however is a little over excessive? (Maybe not in a commercial environment)

Is there anyway to schedule it ourselves?
Unless I'm being blind, I can't find the setting to change what kind of QNAP I have on my profile. I now own a TS-253A
User avatar
Moogle Stiltzkin
Guru
Posts: 11448
Joined: Thu Dec 04, 2008 12:21 am
Location: Around the world....
Contact:

Re: [WARNING!!] New feature in QTS (Data Scrubbing) could be destructive!

Post by Moogle Stiltzkin »

This is the reason why a data scrub may cause disk errors to show up even if there were no such errors previously. In fact the same phenomenon often happens during RAID 5 rebuilds after a disk failure.

For example, the disk may have bad blocks in an area of the RAID which is unused or infrequently accessed. During normal operation there is no error from the disk, but during a data scrub there will be an I/O error if the bad block is unrecoverable. It should be noted that this is preferable to happen during a data scrub rather than during a degraded RAID rebuild. If it happens during a rebuild then there is a chance of data loss or data corruption (since there is no redundancy) - during a scrub there is no such risk.

In this way, the data scrub can alert you to bad disks which might otherwise cause your data loss during a RAID rebuild, as well as repairing silent data corruption of the RAID data and parity blocks due to faulty sectors. It is kind of like a "dry-run" RAID rebuild.
thank your for the explanation, pretty much sums it up :}

*raid scrub scheduled enabled :mrgreen:
NAS
[Main Server] QNAP TS-877 (QTS) w. 4tb [ 3x HGST Deskstar NAS & 1x WD RED NAS ] EXT4 Raid5 & 2 x m.2 SATA Samsung 850 Evo raid1 +16gb ddr4 Crucial+ QWA-AC2600 wireless+QXP PCIE
[Backup] QNAP TS-653A (Truenas Core) w. 4x 2TB Samsung F3 (HD203WI) RaidZ1 ZFS + 8gb ddr3 Crucial
[^] QNAP TL-D400S 2x 4TB WD Red Nas (WD40EFRX) 2x 4TB Seagate Ironwolf, Raid5
[^] QNAP TS-509 Pro w. 4x 1TB WD RE3 (WD1002FBYS) EXT4 Raid5
[^] QNAP TS-253D (Truenas Scale)
[Mobile NAS] TBS-453DX w. 2x Crucial MX500 500gb EXT4 raid1

Network
Qotom Pfsense|100mbps FTTH | Win11, Ryzen 5600X Desktop (1x2tb Crucial P50 Plus M.2 SSD, 1x 8tb seagate Ironwolf,1x 4tb HGST Ultrastar 7K4000)


Resources
[Review] Moogle's QNAP experience
[Review] Moogle's TS-877 review
https://www.patreon.com/mooglestiltzkin
QNAP_Daniel
Starting out
Posts: 49
Joined: Fri Sep 14, 2012 4:27 pm

Re: [WARNING!!] New feature in QTS (Data Scrubbing) could be destructive!

Post by QNAP_Daniel »

ensignvorik wrote:Good reply.

Does anyone else think doing this weekly however is a little over excessive? (Maybe not in a commercial environment)

Is there anyway to schedule it ourselves?
Hi there,

Yes once a week may be a bit over-zealous, I think the default may be changed to monthly in the future firmware release.

You can change the schedule by yourself or disable it in the Storage Manager global settings (open Storage Manager then click the settings gear icon in the top-right of the window).

Thanks!
User avatar
Don
Guru
Posts: 12289
Joined: Thu Jan 03, 2008 4:56 am
Location: Long Island, New York

Re: [WARNING!!] New feature in QTS (Data Scrubbing) could be destructive!

Post by Don »

Global settings in storage manager.
Use the forum search feature before posting.

Use RAID and external backups. RAID will protect you from disk failure, keep your system running, and data accessible while the disk is replaced, and the RAID rebuilt. Backups will allow you to recover data that is lost or corrupted, or from system failure. One does not replace the other.

NAS: TVS-882BR | F/W: 5.0.1.2346 | 40GB | 2 x 1TB M.2 SATA RAID 1 (System/VMs) | 3 x 1TB M.2 NMVe QM2-4P-384A RAID 5 (cache) | 5 x 14TB Exos HDD RAID 6 (Data) | 1 x Blu-ray
NAS: TVS-h674 | F/W: 5.0.1.2376 | 16GB | 3 x 18TB RAID 5
Apps: DNSMasq, PLEX, iDrive, QVPN, QLMS, MP3fs, HBS3, Entware, DLstation, VS, +
Locked

Return to “System & Disk Volume Management”