Replication throughput issues

Questions about SNMP, Power, System, Logs, disk, & RAID.
JRaymond
Starting out
Posts: 25
Joined: Mon Oct 27, 2014 4:11 am

Replication throughput issues

Post by JRaymond »

We are deploying 380 TS-673's w/36GB RAM and 6x 4TB Seagate IronWolf drives in them.

I have the QNAP expansion card with 1x 10GB fiber SFP module going to a Cisco 2960-X gigbit switch with a 15GB backplane. All cables are CAT6 to the systems we are building.

When I set off a NAS-to-NAS replication on a few systems (3-4) I'm getting 140-160MB/s throughput. When I get 10 systems (or more) replicating, it falls to about 30 MB/s for all the systems. I have a synch going now and my NETWORK throughput from the SFP module is showing 300-370MB/s...
SFP Output.jpg
Resource Overview.jpg
So what is choking this throughput down...and how can I fix it to output faster? I'm moving about 4.5TB to the systems from my master. I would expect with 10 GB from the Master NAS that it should be able to handle 150 MB/s for 20 systems at a time.
You do not have the required permissions to view the files attached to this post.
P3R
Guru
Posts: 13190
Joined: Sat Dec 29, 2007 1:39 am
Location: Stockholm, Sweden (UTC+01:00)

Re: Replication throughput issues

Post by P3R »

JRaymond wrote: Sat Aug 01, 2020 11:21 pm So what is choking this throughput down...
Probably a storage I/O bottleneck due to the load changing from sequential to random access, the more clients you add.

Use the Resource monitor to figure out what the bottleneck is. Get a baseline with only one client and then add clients to see what metric bottlenecks first.
...and how can I fix it to output faster?
If you run RAID 5/6 on the master, RAID 10 should at least handle the random load better.

If possible use SSDs on the master and/or use more master units.
RAID have never ever been a replacement for backups. Without backups on a different system (preferably placed at another site), you will eventually lose data!

A non-RAID configuration (including RAID 0, which isn't really RAID) with a backup on a separate media protects your data far better than any RAID-volume without backup.

All data storage consists of both the primary storage and the backups. It's your money and your data, spend the storage budget wisely or pay with your data!
JRaymond
Starting out
Posts: 25
Joined: Mon Oct 27, 2014 4:11 am

Re: Replication throughput issues

Post by JRaymond »

Do you think adding 2 TB of M.2 drive will help? Relatively cheap to add in...
P3R
Guru
Posts: 13190
Joined: Sat Dec 29, 2007 1:39 am
Location: Stockholm, Sweden (UTC+01:00)

Re: Replication throughput issues

Post by P3R »

Maybe if you run the clients pretty much synchronized but I don't have enough real life experience of SSD caching in your high volume application so I don't know. I'm sorry. :cry:

When I mentioned SSDs I meant an all SSD storage for the master for this task specifically. With 380 NASes to do, I figured that you're in a very large organization with a decent budget and that you perhaps at least temporarily could have access to a number of very large SSDs?
RAID have never ever been a replacement for backups. Without backups on a different system (preferably placed at another site), you will eventually lose data!

A non-RAID configuration (including RAID 0, which isn't really RAID) with a backup on a separate media protects your data far better than any RAID-volume without backup.

All data storage consists of both the primary storage and the backups. It's your money and your data, spend the storage budget wisely or pay with your data!
JRaymond
Starting out
Posts: 25
Joined: Mon Oct 27, 2014 4:11 am

Re: Replication throughput issues

Post by JRaymond »

We "can" make it happen if needed...

I don't need that much space for the master right now...a total of 6TB in RAID6 would get me thru the deployment phase and I can put the Seagate's back in when it's over so I have the same environment as the dealerships.

I've got a second SFP module coming so I can trunk my input into the switch for 20GB of connectivity. We will use it to distribute 6,500 HP systems to the workshops later this year...using the NAS as the repository for the workshop diagnostic software to copy out to the systems. I've got the 2TB of M.2 drive coming as well...it should all be here by end of day Tuesday, so I can do another run of machines and see how the throughput differs.

I've also go my network guy looking at the switch to see if there is anything we can change in the config to make that work better as well. We just set up and did the first run...so a bit of tweaking of equipment is never unexpected before pushing forward.
P3R
Guru
Posts: 13190
Joined: Sat Dec 29, 2007 1:39 am
Location: Stockholm, Sweden (UTC+01:00)

Re: Replication throughput issues

Post by P3R »

My guess is that it is the storage I/O that chokes but as I said, investigate where the bottleneck is before throwing hardware at the problem. The network seem to be the least of your problems right now so I see no need for trunking until you have made other improvements.

I have usually worked with only 25-75 units at a time but I understand the kind of of work you have ahead of you. Good luck with it!
RAID have never ever been a replacement for backups. Without backups on a different system (preferably placed at another site), you will eventually lose data!

A non-RAID configuration (including RAID 0, which isn't really RAID) with a backup on a separate media protects your data far better than any RAID-volume without backup.

All data storage consists of both the primary storage and the backups. It's your money and your data, spend the storage budget wisely or pay with your data!
JRaymond
Starting out
Posts: 25
Joined: Mon Oct 27, 2014 4:11 am

Re: Replication throughput issues

Post by JRaymond »

P3R wrote: Sun Aug 02, 2020 6:18 am My guess is that it is the storage I/O that chokes but as I said, investigate where the bottleneck is before throwing hardware at the problem. The network seem to be the least of your problems right now so I see no need for trunking until you have made other improvements.

I have usually worked with only 25-75 units at a time but I understand the kind of of work you have ahead of you. Good luck with it!
Thanks...I'll need the luck. ;-)

Hardware isn't a big deal...2TB of SSD M.2 is only $200...and it certainly won't hurt the performance of the system down the road. I've read it 2 ways on the SSD cache...it certainly cannot hurt my chances at this point.

I've got a replication cycle going today...so I can't do any testing until that finishes tonight or early morning. Once they are done and I can add a new batch I plan to investigate where the bottleneck appears to be since I have a day off on Monday.

With COVID...I'm stuck at home anyway...might as well work on it and find a solution. :wink:
JRaymond
Starting out
Posts: 25
Joined: Mon Oct 27, 2014 4:11 am

Re: Replication throughput issues

Post by JRaymond »

I opened a ticket with QNAP on this and found the problem. They had me ssh into the machine and run a test on the RAID throughput and the file system. Results are as follows:
[~] # qcli_storage -T
fio test command for physical disk: /sbin/fio --filename=test_device --direct=1 --rw=read --bs=1M --runtime=15 --name=test-read --ioengine=libaio --iodepth=32 &>/tmp/qcli_storage.log
fio test command for RAID: /sbin/fio --filename=test_device --direct=0 --rw=read --bs=1M --runtime=15 --name=test-read --ioengine=libaio --iodepth=32 &>/tmp/qcli_storage.log
Start testing!
Performance test is finished 100.000%...
Enclosure Port Sys_Name Throughput RAID RAID_Type RAID_Throughput Pool
NAS_HOST 3 /dev/sde 188.46 MB/s /dev/md1 RAID 6 413.48 MB/s 1
NAS_HOST 4 /dev/sdf 184.75 MB/s /dev/md1 RAID 6 413.48 MB/s 1
NAS_HOST 5 /dev/sdb 180.82 MB/s /dev/md1 RAID 6 413.48 MB/s 1
NAS_HOST 6 /dev/sda 181.22 MB/s /dev/md1 RAID 6 413.48 MB/s 1
NAS_HOST 7 /dev/sdd 176.73 MB/s /dev/md1 RAID 6 413.48 MB/s 1
NAS_HOST 8 /dev/sdc 183.93 MB/s /dev/md1 RAID 6 413.48 MB/s 1
[~] # qcli_storage -t
fio test command for LV layer: /sbin/fio --filename=test_device --direct=0 --rw=read --bs=1M --runtime=15 --name=test-read --ioengine=libaio --iodepth=32 &>/tmp/qcli_storage.log
fio test command for File system: /sbin/fio --filename=test_device/qcli_storage --direct=0 --rw=read --bs=1M --runtime=15 --name=test-read --ioengine=libaio --iodepth=32 --size=128m &>/tmp/qcli_storage.log
Start testing!
Performance test is finished 100.000%...
VolID VolName Pool Mapping_Name Throughput Mount_Path FS_Throughput
1 DataVol1 1 /dev/mapper/cachedev1 327.40 MB/s /share/CACHEDEV1_DATA 143.50 MB/s

My drives and the RAID are saturated...

Their recommendation is, since I have the M.2 SSD's coming tomorrow, that I use them as cache and it should dramatically improve the throughput.

If it doesn't get us where we need to be (<12 hours replication for 20 systems) then we can try Q-Tier. The final solution if that all fails is to use all SSD's for the deployment. I can get by with 5x 1TB drives in RAID5 for 4TB of storage with a fault tolerance of 1 drive...

I'll keep plugging away at it...
P3R
Guru
Posts: 13190
Joined: Sat Dec 29, 2007 1:39 am
Location: Stockholm, Sweden (UTC+01:00)

Re: Replication throughput issues

Post by P3R »

JRaymond wrote: Tue Aug 04, 2020 7:59 am My drives and the RAID are saturated...
Not surprising. :wink:
If it doesn't get us where we need to be (<12 hours replication for 20 systems) then we can try Q-Tier.
That recommendation I don't understand at all. If the cache, that at least in theory should be able to adapt and keep the data currently being requested on SSD, doesn't help how could Qtier? Qtier is pretty much static so of course it will help for the stuff that happen to fit on SSD but then it will be back to what you have now. Since you're down to 30 MB/s with 10 systems, imagine the brick wall it will run into with 20...
RAID have never ever been a replacement for backups. Without backups on a different system (preferably placed at another site), you will eventually lose data!

A non-RAID configuration (including RAID 0, which isn't really RAID) with a backup on a separate media protects your data far better than any RAID-volume without backup.

All data storage consists of both the primary storage and the backups. It's your money and your data, spend the storage budget wisely or pay with your data!
JRaymond
Starting out
Posts: 25
Joined: Mon Oct 27, 2014 4:11 am

Re: Replication throughput issues

Post by JRaymond »

P3R wrote: Tue Aug 04, 2020 4:20 pm
JRaymond wrote: Tue Aug 04, 2020 7:59 am My drives and the RAID are saturated...
Not surprising. :wink:
If it doesn't get us where we need to be (<12 hours replication for 20 systems) then we can try Q-Tier.
That recommendation I don't understand at all. If the cache, that at least in theory should be able to adapt and keep the data currently being requested on SSD, doesn't help how could Qtier? Qtier is pretty much static so of course it will help for the stuff that happen to fit on SSD but then it will be back to what you have now. Since you're down to 30 MB/s with 10 systems, imagine the brick wall it will run into with 20...
Yeah...we had a discussion this morning...

Ordering 4x 1 TB SSD's for the system, adding a second NAS and going to divide the line in half so each "master" is feeding 10 systems at a time. I have "spare" systems to use as hot-spare replacements to ship out...so we'll see how this goes the next few days.
P3R
Guru
Posts: 13190
Joined: Sat Dec 29, 2007 1:39 am
Location: Stockholm, Sweden (UTC+01:00)

Re: Replication throughput issues

Post by P3R »

Please keep the forum updated with your experiences as they can be very valuable to other customers doing large volume deployments in the future.
RAID have never ever been a replacement for backups. Without backups on a different system (preferably placed at another site), you will eventually lose data!

A non-RAID configuration (including RAID 0, which isn't really RAID) with a backup on a separate media protects your data far better than any RAID-volume without backup.

All data storage consists of both the primary storage and the backups. It's your money and your data, spend the storage budget wisely or pay with your data!
JRaymond
Starting out
Posts: 25
Joined: Mon Oct 27, 2014 4:11 am

Re: Replication throughput issues

Post by JRaymond »

P3R wrote: Wed Aug 05, 2020 12:37 am Please keep the forum updated with your experiences as they can be very valuable to other customers doing large volume deployments in the future.
Not a problem...

We decided to go with 5x 1TB drives for a 4TB RAID in case we needed more space before we were finished.

The last build cycle has been 24 hours for 10 systems...no changes other than some additional files being added to the directory. Still under 3TB of data to transfer...it's a PITA.

Had to order some 2.5">3.5" caddy's to be able to use the SSD's. Everything scheduled for delivery tomorrow...

I think it's gonna be busy... :lol:
P3R
Guru
Posts: 13190
Joined: Sat Dec 29, 2007 1:39 am
Location: Stockholm, Sweden (UTC+01:00)

Re: Replication throughput issues

Post by P3R »

JRaymond wrote: Wed Aug 05, 2020 9:54 am Had to order some 2.5">3.5" caddy's to be able to use the SSD's.
2.5" SSDs should fit nicely in the existing drive trays using the screws (usually the black ones) specifically supplied for 2.5" storage. Check page 3 in the TS-X73 Quick Installation Guide.
RAID have never ever been a replacement for backups. Without backups on a different system (preferably placed at another site), you will eventually lose data!

A non-RAID configuration (including RAID 0, which isn't really RAID) with a backup on a separate media protects your data far better than any RAID-volume without backup.

All data storage consists of both the primary storage and the backups. It's your money and your data, spend the storage budget wisely or pay with your data!
JRaymond
Starting out
Posts: 25
Joined: Mon Oct 27, 2014 4:11 am

Re: Replication throughput issues

Post by JRaymond »

P3R wrote: Wed Aug 05, 2020 3:55 pm
JRaymond wrote: Wed Aug 05, 2020 9:54 am Had to order some 2.5">3.5" caddy's to be able to use the SSD's.
2.5" SSDs should fit nicely in the existing drive trays using the screws (usually the black ones) specifically supplied for 2.5" storage. Check page 3 in the TS-X73 Quick Installation Guide.
Not on the ones we have...no screws for them at all, or holes to accommodate the 2.5" drives.

They were $10/ea. and in the grand scheme cheap. basically they are the outer shape of a 3.5" drive with the connectors in the correct place. The 2.5" slips on and screws down and then it sits inside the caddie just like a regular 3.5" drive does.
P3R
Guru
Posts: 13190
Joined: Sat Dec 29, 2007 1:39 am
Location: Stockholm, Sweden (UTC+01:00)

Re: Replication throughput issues

Post by P3R »

JRaymond wrote: Wed Aug 05, 2020 8:29 pm Not on the ones we have...no screws for them at all, or holes to accommodate the 2.5" drives.
Maybe you had some special order then because of the high-volume order where they didn't add the 2.5" screws?

This is the standard delivery that should come with every TS-673.

It's very odd that the trays wouldn't even have screw holes though. :-0 I can't see anything else than that it would be more expensive to make those special trays. As far as I know almost every desktop Qnap (except for some odd models) since many years back have supported 2.5" disks on the 3.5" drive trays and the TS-673 specifications says 2.5" SSDs should work in the 3.5" bays.
RAID have never ever been a replacement for backups. Without backups on a different system (preferably placed at another site), you will eventually lose data!

A non-RAID configuration (including RAID 0, which isn't really RAID) with a backup on a separate media protects your data far better than any RAID-volume without backup.

All data storage consists of both the primary storage and the backups. It's your money and your data, spend the storage budget wisely or pay with your data!
Post Reply

Return to “System & Disk Volume Management”