ES1686DC Poor Performance / Crashing VMs

Post Reply
Stevecc
First post
Posts: 1
Joined: Wed Sep 23, 2020 6:26 am

ES1686DC Poor Performance / Crashing VMs

Post by Stevecc »

For our ESXI environment, we purchased an ES1686DC to replace our old dying SAN.

This purchase happened in May, hoping that this machine was going to be as high performing and painless as described and overall painless as our other smaller end QNAP devices we have, we jumped in.

In the first few weeks, loading it full of 14TB SAS Exos drives, 4TB of SAS enterprise SSD's, 4TB internal enterprise NVME SSDs, getting it configured, 10gig'd to our three older ESXI 6.5 hosts and three newer ESXI 7.0 hosts, set up a large dedup-enabled LUN, we began transferring data from our old SAN. We are running four Mellanox ConnectX3 cards in the QNAP (two per controller, 2 SFP ports per controller) to attempt to take advantage of ISER capabilities. We later learned that ISER is not playing well with QES and qnap still has an open ticket with this, many months later. We rolled back to standard ISCI IP adapter mode direct-sfp-connect to our newer ESXI7 hosts. Older 6.5 hosts are routed through 10gig HP enterprise switches. Current firmware: 2.1.1.0402 Build 20200728

Up and running,
At first, we were moving data really fast. The old SAN IO was topped out, the QNAP was writing data without a fuss. Transfers were taken place within Vsphere via iSCSI LUN all over 10gig. We transferred many of our VM's over and got them up and running live. We then began transferring our much larger (10+TB datastore VM files) and suddenly we saw a drop in write performance. We dropped from 700-800MB/s down to 5-12MB/s. At this point, our smaller VM's that were running on the ES1686DC began go crash because of extreme IO latency. This issue persisted even after canceling our large datastore file transfer processes. It was over 14 hours before the unit began to behave normally again and wasn't quite back up to full write-speed capability and acceptable latency until after I ran a pool scrub. We have been able to trigger this behavior over and over again in our testing.

We've put in tickets with QNAP and they have been working with us, very, very slowly. (It's been months...) They have admitted that they've found some sort of bug/issues with ZFS configuration but not clear to us of what needs to be changed. The last firmware update did not resolve issues. The firmware update I was handed last Friday failed to install entirely with no clear reason/why it did not work. Waiting for further response of what they need to fix in firmware on their end to resolve these issues.

This has been very poor experience considering this is a very expensive enterprise grade device promising very good performance and reliability. :(
Rol
New here
Posts: 2
Joined: Mon Sep 12, 2011 4:12 pm

Re: ES1686DC Poor Performance / Crashing VMs

Post by Rol »

Meanwhile, is there any progress?
finzl
New here
Posts: 5
Joined: Thu Jan 26, 2012 8:08 pm

Re: ES1686DC Poor Performance / Crashing VMs

Post by finzl »

I also had lots of issues with our TES-3085 and I ended up switching to QTS instead.
QTS seems to be much more reliable than the so called "enterprise" operating system.

As their new system QuTS hero is obviously the way to go for them this product seems to just have been abandoned, no improvement in any QES version at all lately.
Raid performance on all flash drives is also not really what we expected. Might be the last product of QNap we bought.
Storever
Starting out
Posts: 15
Joined: Fri Mar 15, 2019 9:22 pm

Re: ES1686DC Poor Performance / Crashing VMs

Post by Storever »

We have recently run into exactly this issue on an ES1686dc running firmware version 20210608. NAS storage seems more or less fine, but the iSCSI block storage used for VMware performance seemed to suddenly tank, causing major production outages. We had difficulty even performing the storage vmotions required to get the VMs back off onto our EOL (but more reliable) storage.

After weeks of begging support to examine the logs that we took during the incident, I was informed yesterday: "They determine IO timeout was related to pool scrubbing. It appears you have the pool scrub configured to run daily. They suggested this is to frequent and suggest you reduce the scrubbing frequency." If daily scrubbing is too frequent, why is it possible to configure daily (or hourly) scrubbing? I just checked on a unit that didn't have pool scrubbing enabled yet, and daily is the default setting when you turn it on!

Also, if we can expect IO timeouts during pool scrubbing, then we can expect monthly production outages as VMs fail to access the VMFS datastores?

I have a lot more to say about this product, but going to keep this particular post to this particular issue.
Post Reply

Return to “QES Operating System (QNAP Enterprise Storage OS)”