Page 1 of 3

ES1640DC v2 Failover Speed

Posted: Wed Dec 04, 2019 7:48 am
by smccloud
So I'm running a ES1640DC v2 as storage for a vSphere cluster and while I'm sure there are faster options out there its fast enough for us in most respects. However, when I pitched it to management the premise was that due to its dual controllers we'd be able to seamlessly fail over to the inactive controller with for maintenance and in an emergency. In practice however, a fail over of the controllers for any reason takes our entire production environment down for 15 minutes or so as the hosts don't recognize the NFS exports once the fail over is complete. Is this a known issue or do I just have something configured wrong?

It is setup with 16 3TB drives in RAID10 and currently a single export for NFS as well as some iSCSI exports for a planned SQL Active/Passive HA migration.

We are still in better shape than our old setup with local storage only but I feel like I'm missing something. Does anyone have any suggestions or is it just the way it is?

Re: ES1640DC v2 Failover Speed

Posted: Wed Dec 04, 2019 8:58 am
by zmaho
how did you connect your setup to vSphere...
is iSCSI on multipath ... ?

could some one correct me please if i am wront but NFS can not have multipath ...

if you have two controller and they both have IP adress .... let say
controller A one have ip 10.10.10.50/24 and con.B have ip 10.10.20.50/24
and you have TWO ip set on you NIC on vSphere ... like 10.10.10.10 and 10.10.20.10
and you setup multipath in iSCSI ... one path should be operational if other goes down?

or did i misunderstood DC series of QNAP ... they can not go from on to another in zero time ...
it is not FAULT TOLERANCE like in vSphere :") ...

Re: ES1640DC v2 Failover Speed

Posted: Wed Dec 04, 2019 9:11 pm
by smccloud
zmaho wrote: Wed Dec 04, 2019 8:58 am how did you connect your setup to vSphere...
is iSCSI on multipath ... ?

could some one correct me please if i am wront but NFS can not have multipath ...

if you have two controller and they both have IP adress .... let say
controller A one have ip 10.10.10.50/24 and con.B have ip 10.10.20.50/24
and you have TWO ip set on you NIC on vSphere ... like 10.10.10.10 and 10.10.20.10
and you setup multipath in iSCSI ... one path should be operational if other goes down?

or did i misunderstood DC series of QNAP ... they can not go from on to another in zero time ...
it is not FAULT TOLERANCE like in vSphere :") ...
NFS4 supports multipath. And it is working to the currently active controller.

Per the product page
The ES1640dc v2 is powered by two Intel® Xeon® E5-2400 v2 processors and features dual active-active controller architecture, ensuring businesses with nearly zero downtime high availability as the standby controller can quickly take over if one controller breaks down. The ES1640dc v2 connects to the JBOD enclosure (EJ1600 v2) via the dual path mini-SAS design to sustain continuous operations even if an external JBOD cable is disconnected. Designed around redundancy, the ES1640dc v2 is the best realization of reliable enterprise storage for uninterrupted mission-critical enterprise tasks and productivity.
I don't consider 15 minutes to be quick or nearly zero downtime. As it stands, if I need o upgrade the firmware on it I need to do it off hours (and really on a weekend) after shutting down all the production VMs.

Do I expect every VM to keep running without issues during a controller fail over, no but I expect minor hiccups as that is what the product literature shows.

If I need to reconfigure our setup to use iSCSI instead of NFS I will. Not what I'd like to do but if that is the final solution so be it.

Re: ES1640DC v2 Failover Speed

Posted: Wed Dec 04, 2019 10:13 pm
by smccloud
Ok, I just created an iSCSI Target & LUN, configured vSphere to connect to all 4 IPs configured on our storage networks of our ES1640dc v2 and it appears to be using both controllers for traffic. Although NFS4 supports multipath, it appears the ES1640dc v2 doesn't support it in the same way that it does for multipath on iSCSI. I'll still have to try a fail over outside of normal working hours but now I have some VM migrations to get done (and of course our license doesn't support live migration of running VMs unless its host & storage).

Re: ES1640DC v2 Failover Speed

Posted: Wed Dec 04, 2019 11:04 pm
by storageman
smccloud wrote: Wed Dec 04, 2019 10:13 pm Ok, I just created an iSCSI Target & LUN, configured vSphere to connect to all 4 IPs configured on our storage networks of our ES1640dc v2 and it appears to be using both controllers for traffic. Although NFS4 supports multipath, it appears the ES1640dc v2 doesn't support it in the same way that it does for multipath on iSCSI. I'll still have to try a fail over outside of normal working hours but now I have some VM migrations to get done (and of course our license doesn't support live migration of running VMs unless its host & storage).
The ES1640DC is active/active ALUA meaning it only runs on one controller at a time (bear with me).
Multipathing only works on one controller not both controllers (assuming you connect more than 1 NIC on that single controller).
The reason you connect IPs from the other controller is to ensure failover.
If you want to use both controllers on this full time you have to assign a separate storage pool to each controller.

Very few companies over active/active symmetrical NAS (meaning IO multipathing down both controllers).
Whereas in the SAN work active/active symmetrical is quite common.

How have you proved IO is running down both controllers?

Re: ES1640DC v2 Failover Speed

Posted: Wed Dec 04, 2019 11:20 pm
by smccloud
storageman wrote: Wed Dec 04, 2019 11:04 pm
smccloud wrote: Wed Dec 04, 2019 10:13 pm Ok, I just created an iSCSI Target & LUN, configured vSphere to connect to all 4 IPs configured on our storage networks of our ES1640dc v2 and it appears to be using both controllers for traffic. Although NFS4 supports multipath, it appears the ES1640dc v2 doesn't support it in the same way that it does for multipath on iSCSI. I'll still have to try a fail over outside of normal working hours but now I have some VM migrations to get done (and of course our license doesn't support live migration of running VMs unless its host & storage).
The ES1640DC is active/active ALUA meaning it only runs on one controller at a time (bare with me).
Multipathing only works on one controller not both controllers (assuming you connect more than 1 NIC on that single controller).
The reason you connect IPs from the other controller is to ensure failover.
If you want to use both controllers on this full time you have to assign a separate storage pool to each controller.

Very few companies over active/active symmetrical NAS (meaning IO multipathing down both controllers).
Whereas in the SAN work active/active symmetrical is quite common.

How have you proved IO is running down both controllers?
Opening System Status -> Resource Monitor -> Network Usage. I can see traffic on the following interfaces, Ethernet 1 (SCA), Ethernet 3 (SCA), Ethernet 1 (SCB) & Ethernet 3 (SCB). Also logging into the GUI for my switches (MikroTik CRS309-1G-8S+IN) I can see traffic on the ports for both controllers. Although one controller is preferred over the other they are both active.
Switch 1.png
Switch 2.png

Re: ES1640DC v2 Failover Speed

Posted: Wed Dec 04, 2019 11:32 pm
by storageman
Yes they may be active but that doesn't mean much, SCB is only active for failover not for read/write traffic.
What ports should I look at for the SCB ports?
If you are achieving this I need to speak to Qnap because this would be news to me!

Re: ES1640DC v2 Failover Speed

Posted: Wed Dec 04, 2019 11:39 pm
by smccloud
They aren't named great in the switch (I should change them) but NASA is is SCA and NASB is SCB. It appears to be sending more traffic to SCB even though SCA is the active controller.

Re: ES1640DC v2 Failover Speed

Posted: Wed Dec 04, 2019 11:44 pm
by storageman
And have you only one pool?

Re: ES1640DC v2 Failover Speed

Posted: Wed Dec 04, 2019 11:47 pm
by smccloud
storageman wrote: Wed Dec 04, 2019 11:44 pm And have you only one pool?
Yep, only one pool. Since we're using spinning rust that was the decision made to get the most IOPS possible (also why we did RAID 10).

Re: ES1640DC v2 Failover Speed

Posted: Wed Dec 04, 2019 11:56 pm
by storageman
Hit it with a lot of traffic report back.
If it was multipathing correctly the traffic should be fairly equal across both controller ports and they aren't.
I think one side will only be "are you there" traffic.
What does storage space say on controller assignment "pool1 SCA"?

Re: ES1640DC v2 Failover Speed

Posted: Thu Dec 05, 2019 12:01 am
by smccloud
Pool tank is managed by SCA (tank is a hold over from when I used FreeNAS & NAS4Free). Nothing on SCB.

So far our main two servers are connected via iSCSI and I'm starting to migrate VMs. Third server isn't playing nice, but still only has a single DAC in place so its not a huge deal (internal storage for now).

Re: ES1640DC v2 Failover Speed

Posted: Thu Dec 05, 2019 12:05 am
by storageman
Then your naming must be wrong surely? Most traffic should be on SCA.
If you've got it mulitpathing across both controllers I'll eat my (virtual) hat.

Technical differences between approaches is below:
ActiveActive ALUA.jpg
Qnap ES1640DC/1686 are ALUA.

Re: ES1640DC v2 Failover Speed

Posted: Thu Dec 05, 2019 12:07 am
by smccloud
Its possible. But the native interface shows the same thing.

Re: ES1640DC v2 Failover Speed

Posted: Thu Dec 05, 2019 12:48 am
by smccloud
Well, I just figured out what I had wrong with our third server. Its connected to an IP on SCB and not SCA but still working fine. From what I've read it shouldn't be working, but it is.