high datastore latency - VMware ESXi 6.0

QNAP NAS solution for server virtualization and clustering/HA/FT
Post Reply
baileyt
New here
Posts: 3
Joined: Fri Jul 28, 2017 7:24 pm

high datastore latency - VMware ESXi 6.0

Post by baileyt » Fri Jul 28, 2017 8:30 pm

I'm testing out a ES1640dc v2 and after I vMotion some virtual machines to the QNAP array they experience fairly high read and write latency spikes (in some cases over 200 ms). The raid pool is running 10 - 7.2k SATA drives with two 400 GB SSD's for cache. No matter the configuration, RAID5/6/10, dedupe on/off, compression on/off, I cannot get these spikes to go away. The data traffic goes over dual 10 GB ports in LACP so bandwidth definitely shouldn't be the issue. The VAAI plug is installed onto the hosts so shouldn't be any issues there. Does anyone have any other suggestions as to what's going on here? When we vMotion the VM's back to the old storage the latency issues go away so it's definitely something related to the QNAP array.

justsomeguy
First post
Posts: 1
Joined: Fri Jul 28, 2017 10:24 pm

Re: high datastore latency - VMware ESXi 6.0

Post by justsomeguy » Fri Jul 28, 2017 10:35 pm

I have seen this exact problem with a TVS-871U-RP when I was running iSCSI over dual 10Gbe NICs. I eventually narrowed it down to the cache functionality itself being broken so I gave up on trying to use an SSD cache in front of spinning disks on QNAP.

The original setup I was supporting where I saw this was using 4 6TB WD drives in RAID10, with originally two QNAP branded 128GB SSD's in the MSata cache slots. I thought latency spikes were due to cache misses so I replaced those with two Samsung 1TB SSD's in the same slots. Latency spikes remained (and in fact were just incredibly bad sometimes, over 2000ms spikes would appear in vCenter's graphs). Since at this point I suspected that there was a problem with how caching was implemented, I put 4 1TB Samsung SSD's in RAID5 into empty bays and let the cache run in front of them. The latency spikes remained. So I disabled the cache, and the latency spikes went away. Under the exact same user load, the same RAID10 of 4 6TB WD drives never spikes anywhere near as high as it does with the cache. I moved production onto the RAID5 of SSD's, and use the spinning rust for backups now.

I have not tried to use the caching since, but if you want to see how broken this is, the latency spikes will appear when the cache is doing NOTHING.

Create the cache using the two MSata ports and enable it the storage control panel, but exclude all of your storage from the LUNs that it tries to cache. Then use the performance monitor to watch the latency. You'll see that there are random latency spikes on the cache SSD's even when they are not doing anything. Caching functionality appears to be completely broken from QNAP.

User avatar
Toxic17
Ask me anything
Posts: 5197
Joined: Tue Jan 25, 2011 11:41 pm
Location: Planet Earth
Contact:

Re: high datastore latency - VMware ESXi 6.0

Post by Toxic17 » Sat Jul 29, 2017 1:01 am

is it possible to setup Qtier if caching is not working the best?

https://youtu.be/OpT4YsUXHc0
Regards Simon

QTS 4.x User Guidex

QNAP Club Repository
Submit a ticket • QNAP Helpdesk
QNAP Tutorials, User Manuals, FAQs, Downloads, Wiki
When you ask a question, please include the following


NAS: TS-473-32GB QM2-2P QXG-10G1T 4.4.1.1064 • TVS-463-16GB 4.4.1.1064 QM2-2S10G1TB • TS-459 Pro 2GB 4.2.6 • TS-121 4.3.3.0998 • APC Back-UPS ES 700G •
QPKG's: TwonkyServer 8.51 • Apache73 v2441.7310 • QSonarr 3.0.3.644 • QNBZGet 21.0 • phpMyAdmin 4.9.0.1 • Qmono 5.20.1.19 • McAfee 3.0.1 • Lychee 3.2.16 • HBS 3.0.1908029 • LEgo v3.0.0
Network: VM Hub 3.0 <500/35> • UniFi USG Pro 4 • UniFi USW-16-150W • UniFi USW-8-60W • UniFi CloudKey Gen2+• UniFi G3-Flex • UAP AC Pro • UAP AC Lite • SLM2008 • Dell 7050 MFF •

baileyt
New here
Posts: 3
Joined: Fri Jul 28, 2017 7:24 pm

Re: high datastore latency - VMware ESXi 6.0

Post by baileyt » Sat Jul 29, 2017 3:26 am

BTW, one thing I didn't mention is we're using NFS.

I've disabled SSD caching on that pool yet I still see the latency so I'm pulling everything off, rebuilding the pool without caching and see what it looks like. My experiences, though, are very similar to yours in that I have very little workload on the QNAP yet get these rather large, random latency spikes.

Unfortunately the ES line doesn't support the Qtier feature.

baileyt
New here
Posts: 3
Joined: Fri Jul 28, 2017 7:24 pm

Re: high datastore latency - VMware ESXi 6.0

Post by baileyt » Tue Aug 01, 2017 2:16 am

So I've taken the SSD out of cache and set those as their own pool. On the ESXi host I have the QNAP SSD pool (RAID1), QNAP SATA pool (RAID10) and our current Tintri T540 array that I'm looking to replace. I have one VM sitting on the SSD pool and still get latency spikes. We're not seeing these spikes from the Tintri datastore (nor have we ever) and when pinging the QNAP data port from the ESXi host the response times are good and I never see these latency spikes show up so it looks to be something within the array itself. On the SSD pool I do have thin provision and compression enabled (no dedupe), but I can't believe I'm seeing these spikes with only one VM and an all flash pool.

We're running Seagate drives ST4000NM0023 (spinning) and Seagate ST400FM0233 (SSD) and I see there are newer firmware releases for each so that'll be my next step. If that doesn't work I'm just going to give up on QNAP.

Post Reply

Return to “Server Virtualization & Clustering”