TVS-871: Migrating from 6x6 TB RAID-6 to 4x14TB RAID-6

burgerkind · Post by **burgerkind** » Sat Jun 13, 2020 7:57 pm

Hello everyone,

up to yesterday I was using a TVS-871 NAS with 6 Disks (6 x WD RED 6 TB on RAID6 = 24 TB) and 2 caching-SSDs (500 GB each).
One of the disks was showing SMART-errors and another one has some read errors.

I decided to replace 4 of them with WD RED 14 TB disks.
So my final setup should be:
Main data: 4 x 14 TB (RAID6 = 28 TB)
Archive data on old HDDs: 2 x 6 TB (RAID1 = 6 TB)
2 Caching SSDs (not important)

Now I have a small problem migrating my data.
1.1.) I replaced the first 6 TB HDD (the one with SMART errors) yesterday and the resync already finished after 13-14h.
1.2.) Then I replaced the 2th 6 TB HDD (the one with read errors) today. The resync is still runnig (5%) and also need about 13-14h.
Maybe step 2 was an error because I would be able to change 4 of 6 disks from 6TB to 14TB, but I found no way to migrate a 6-disk RAID6 down to a 4-"larger disk" RAID6

Migrating to 6x14TB is too costly and at the moment no option for me

The only option I see would be
2.1.) Removing one drive from RAID6 (still usable with 5 of 6 drives but degraded)
2.2.) adding a second storagepool (3 x 14 TB RAID5) using the slot of the degraded RAID6 and the slots of the two SSDs
2.3.) copying all data from the old pool (RAID6 - 24 TB degraded) to the new pool (RAID5 - 28TB).
2.3.) Deleting the old (RAID6) storage pool, removing the disks and add another pool with 2 disk RAID1.
2.4.) Migrate RAID5 to RAID6 after adding the 4th 14TB HDD.

Remaining some questions to you:
1.) how to stop resyncing to my second disk (Step 1.2)? It makes no sense to finish RAID resync and then removing the disk to use it on another storage pool.
2.) how to add a new disk without automatically resync a degraded RAID?
3.) is there any better way?

Thank you very much for your help!
PS: I am using firmware 4.4.2.1320 Build 20200529 and I am an experienced Linux user. Using the shell shouldn't be a problem

dolbyman · Post by **dolbyman** » Sat Jun 13, 2020 11:09 pm

best way would be to start from scratch and then restore from backups (always have external backups)

removing your system volume will cause a headache anyways (loss of apps etc) ..so a clean restart is best

jacobite1 · Post by **jacobite1** » Sun Jun 14, 2020 12:55 am

You cannot shrink an array in QTS - if you want to reduce the number of spindles (aka HDDs) you need to backup, start from scratch and restore. It's a huge pain I know, but what's 'necessary' here.

burgerkind · Post by **burgerkind** » Sun Jun 14, 2020 1:28 am

Yes, the system volume will be another barrier

The usability could be better

Backups are available but setting the userpermissions and all settings again will be a pain. In addition it also takes a whole day to restore.
Also the restore of a smb Domaincontroller was not reliable in past.

I already tried to remove the syning disk ("mdadm --manage --fail /dev/md1 /dev/sdh3" && "mdadm --manage --remove /dev/md1 /dev/sdh3") and manually migrating the raid6 to raid5 ("mdadm --grow /dev/md1 --level=raid5 --raid-devices=5 --backup-file=/mnt/HDA_ROOT/RAIDBackup/mdadmbackupfile").
I cannot recommend that because the procedure is really slow (< 1MB/s) and would take over 60 Days (90.000 to 100.000 minutes) to finish.
Another problem: after setting the device to "fail" qnap shows I/O errors on that disk (viewtopic.php?p=755862) and refuses to do anything an that disk without doing a bad blocks scan (even takes days or weeks)

Now
1.) I removed all disks from the old RAID (4 x 6 TB + 1 x 14 TB = complete RAID useable but in reshape" state)
2.) I overwrite more than 1 TB of the (only shown as) "erroneous" disk - because the first 1 TB was the (thin) system volume
3.) start again from scratch with 3 x 14 TB RAID5, restore a configuration backup, hope that the Domaincontroller-Backup also works and then
4.) try to add the old raidset (read-only would be enough) to copy the data

after that I'll migrate the RAID5 to RAID6.

Hope that works in a reasonable time :/

jacobite1 · Post by **jacobite1** » Sun Jun 14, 2020 1:45 am

Bear in mind migrating from RAID5 to 6 is really slow. Doing it with 6TB disks took me 8 days, someone on the Reddit sub took over a month with 12TB disks.

If you want to end with RAID6 and can start with 4 disks that's absolutely what I'd recommend.

The slightly annoying/good depending on viewpoint is that QNAP is still using a fairly old MDADM version - as far as I can tell, new arrays are built with 1.2. Versions 3 and above _do_ support rebuilding an array with fewer devices - aka shrinking an array.

Perhaps you could put a feature request in with QNAP as a ticket? They may honour it at some point!

burgerkind · Post by **burgerkind** » Mon Jun 15, 2020 4:55 am

Long journey since my last post

Step 1 to 3 worked fine. I got a fresh installed QNAP NAS using 3 x 14 TB.
@jacobite1: your advice is good. Migrating RAID 5 to RAID 6 may take a long time (maybe even a month). But I have no choice because I only have 8 Slots and already used one of the 14TB disks to replace the broken 6 TB disk.
As long as I am able to use the system that will be a tradeoff.

At the moment I am working on step 4. But I have already done some really big mistakes

Mistake 1: only stopped the reshape (migration from raid 6 to raid 5) and did not revert ("mdadm --assemble --update=revert-reshape --backup-file=...") it
The new installation did not find a RAIDset in reshape state. Also the backup-file specified when growing is needed to assemble the raidset.

Mistake 2: I used the place "/mnt/HDA_ROOT/RAIDBackup/" for the mdadm backup-file when growing (but not finishing) the filesystem.
That was a bad choice. After adding the old 5 disks on the newly installed system the device md9 (where the backup-file was placed) got overridden (without any confirmation!) with the new installation (and the backup-file got lost)

Mistake 3: I suspected that this file could be lost and saved this file on my laptop too.
But I did not realize that this file is constantly changing while running reshape is in progress. So my backup from beginning of reshape maybe nothing worth.

On the new system I was able to assemble the old RAID6 using the backuped "mdadmbackupfile" from my laptop. The reshape resumed but I was not able to access my files.
pvscan did not find any volumegroup on that md-device and qnap was not able to find anything on the disks: "[Storage & Snapshots] Failed to scan free disks for existing storage. One or more RAID group member disks are missing.".

After reverting the reshape (it took about 2-3h) I was able to find my old volume group (now "vg2") and the logical volumes on it:

--- Logical volume ---
LV Path /dev/vg2/lv5
LV Name lv5
VG Name vg2
LV UUID oqJ6qP-pjvC-JnGT-190F-OG01-2R5i-vYCa7l
LV Write Access read only
LV Creation host, time NASED2DDD, 2015-03-08 09:49:06 +0100
LV Pool name tp2
LV Status available
# open 1
LV Size 19.00 TiB
Mapped size 100.00%
Mapped sectors 40802189312
Current LE 4980736
Segments 1
Allocation inherit
Read ahead sectors 8192
Block device 253:22

Even the QNAP webinterface was able to find that volumes (after rebooting the device).
But now the pool is looking a bit strange:

Mounting /dev/mapper/cachedev5 (and also /dev/vg2/lv5) using "mount" from SSH was not possible (no superblock found).
dumpe2fs raised hope:

dumpe2fs 1.43.9 (8-Feb-2018)
Filesystem volume name: DataVol1
Last mounted on: /share/CACHEDEV1_DATA
Filesystem UUID: ba35370c-6ce6-4af1-bcff-9eb4d31cf522
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: ext_attr filetype meta_bg extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: not clean with errors
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 318767104
Block count: 5100273664
Reserved block count: 131072
Free blocks: 1887673545
Free inodes: 315511981
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 2048
Inode blocks per group: 128
RAID stride: 16
RAID stripe width: 64
First meta block group: 2048
Flex block group size: 16
Filesystem created: Sun Mar 8 09:51:03 2015
Last mount time: Wed Oct 30 19:32:02 2019
Last write time: Sun Jun 14 21:34:03 2020
Mount count: 1
Maximum mount count: -1
Last checked: Wed Oct 30 19:19:05 2019
Check interval: 0 (<none>)
Lifetime writes: 15 TB
Reserved blocks uid: 0 (user admin)
Reserved blocks gid: 0 (group administrators)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Default directory hash: half_md4
Directory Hash Seed: d2a64263-5bc4-4097-b005-21e8adc46c60
Directory Hash Rev: 0
Directory Magic Number: 0x514E4150
Journal backup: inode blocks

I assume that i've destroyed some data (maybe the first 20-30 Gig? Maybe more) of my RAID.

Mistake 4: Using the e2fsck command from SSH-Shell. Isn't there any fsck.ext4?
I am sure that this tool is not suitable for fixing ext4 filesystems.
Maybe I destroyed some additional data using that wrong tool.

My old debian debootstrap on that NAS is not available anymore.
Debootstrapping again is not easy because the optware package is not available anymore.
I managed it using a debootstrapped Debian 10 from another server.

Running this Debian 10 in a chroot environment allows me using the newer fsck.ext4 tool on the damaged md-device.
To get possible places of superblock backups I faked creating a new filesystem (using mkfs.ext4 with the "-n" param) on the device:

mkfs.ext4 -n /dev/mapper/cachedev5
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544, 1934917632,
2560000000, 3855122432

Mistake 5:
Running fsck.ext4 with param "-b 32768" was possible and I already fixed a lot of errors.
Unlikely I was not able to use a undo file (can't remember the error message).
Most of them are looking like this:

Inode 523910 block 513 conflicts with critical metadata, skipping block checks.
Inode 523910 block 1435 conflicts with critical metadata, skipping block checks.
Inode 523910 block 513 conflicts with critical metadata, skipping block checks.
Inode 523910 block 1281 conflicts with critical metadata, skipping block checks.
Inode 523910 block 2807 conflicts with critical metadata, skipping block checks.
Inode 523910 block 513 conflicts with critical metadata, skipping block checks.
Inode 523910 block 1429 conflicts with critical metadata, skipping block checks.
Inode 523910 block 1 conflicts with critical metadata, skipping block checks.
Inode 523910 block 1 conflicts with critical metadata, skipping block checks.

But I already knew that the first XXX GB are destroyed. Why using such an early superblock? Why not using the last one?!?
I don't know :/

fsck is still running.
I hope getting some of the old archive data (not included in backup) back. But it isn't the end of the world losing it :/

burgerkind · Post by **burgerkind** » Tue Jun 16, 2020 1:34 am

fsck still did not finish and hangs because on low memory.
fsck is consuming about 14,5 GB RAM (and ~ 24 GB swapspace) :/

# free -m
total used free shared buffers cached
Mem: 15905 15656 249 39 2 97
-/+ buffers/cache: 15556 349
Swap: 23967 8272 15694

# top
Mem: 16039092K used, 248144K free, 40512K shrd, 3008K buff, 115616K cached
CPU: 4.6% usr 1.8% sys 0.0% nic 65.5% idle 27.9% io 0.0% irq 0.0% sirq
Load average: 17.51 12.80 13.87 3/1528 11041
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
12504 1076 admin D 14513m 90.7 2 0.2 fsck.ext4 -p -b 32768 /dev/mapper/cachedev5

I assume it will never finish.
Let's note for myself - Mistake 6: do not create filesystems bigger than 8-10 TB anymore because fsck is not able to run on them on systems with "low" (16 GB!) RAM

I think I should break here and stop wasting time.
If I am in mood I can give it another try with external USB/Thunderbolt cases (I hope I'll find 5 of them) on my Laptop (128 GB RAM).
Anyway. Ordered a 5th and 6th 14 TB disk today (delivery on thursday) to fresh reinstall again with the recommended 4 Disks (or maybe 5

) for RAID6.
Next week I will send some feature requests to qnap

burgerkind · Post by **burgerkind** » Sat Jun 20, 2020 5:30 am

Bought a TL-D800C extension to put the old disks into and connected that extension to my laptop.
Ubuntu does assemble the RAID but ubuntu does not recognize the LVM "thick volume".
Can't activate the volume group to run fsck.ext4.

WARNING: Unrecognised segment type thick

Qnap seems to use a patched lvm - not compatible with default ones.
If only I had known. I hadn't bought such a system at all.
Maybe another thing for a feature request.

Also tried to compile LVM2 from GPL-Sources (https://sourceforge.net/projects/qosgpl ... S%204.4.1/) but had no luck.
The kernel would be even more difficult. So i was searching for another solution.
If I am not able to build a bootable system running a QNAP compatible kernel with QNAP compatible LVM modules then I do not trust this NAS anymore. Maybe I sell it on ebay and use my disks on a system from a competitor. Maybe installing a custom OS (is FreeBSD running on that hardware?).

At the moment I am running my TVS-871 with 5 ( little upgrade

) empty 14 TB disks on RAID6. Raid initialization will be finished in 1-2 days.
Then I connected the TL-D800C to this qnap, activated the volume groups (it is possible there) and started dumping (dd'ing) the 19TB filesystem to an image file on the new RAID6 disks.

root@NAS:/share/HDDImage# dd if=/dev/vg2/lv6 bs=4096 conv=notrunc,noerror | pv -tpreb -s 20890720927744 | dd of=/share/HDDImage/image.ext4
4.11TiB 6:12:29 [ 201MiB/s] [==========================================================> ] 21% ETA 22:28:48

I hope I can repair this dump with fsck.ext4 over NFS or SMB on my laptop or another system with more than 16GB RAM.

QNAP NAS Community Forum

TVS-871: Migrating from 6x6 TB RAID-6 to 4x14TB RAID-6

TVS-871: Migrating from 6x6 TB RAID-6 to 4x14TB RAID-6

Re: TVS-871: Migrating from 6x6 TB RAID-6 to 4x14TB RAID-6

Re: TVS-871: Migrating from 6x6 TB RAID-6 to 4x14TB RAID-6

Re: TVS-871: Migrating from 6x6 TB RAID-6 to 4x14TB RAID-6

Re: TVS-871: Migrating from 6x6 TB RAID-6 to 4x14TB RAID-6

Re: TVS-871: Migrating from 6x6 TB RAID-6 to 4x14TB RAID-6

Re: TVS-871: Migrating from 6x6 TB RAID-6 to 4x14TB RAID-6

Re: TVS-871: Migrating from 6x6 TB RAID-6 to 4x14TB RAID-6