Constant iSCSI disconnects caused by RAID scrubbing
Posted: Mon Sep 28, 2020 4:09 am
Title sums it up, this is the detail.
1. TS-451+ (two of them, each set up with a 5TB iSCSI target on interface #2, 1Gb connection.
2. Latest firmware, but also last one as I tried backrevving at one point. Firmware 4.4.3.1421
3. 16GB RAM, each 451 has four HDD's. One has 4 10TB drives, one has 4 4TB drives, both in RAID5
4. Three ESXi 6.7 servers making iSCSI connections to each TS-451+ and using datastores on each QNAP.
5. Dedicated iSCSI network for iSCSI, with two QNAPs and three ESXi connections and nothing else. I bought managed switches for both interfaces for testing to see if there was weird traffic on the subnet, but if anything there was less there than I thought. Nothing that looked like it wasn't normal traffic.
6. Symptom: In a word, dog slow. Anything coming off of those ESXi servers that needed to read some data from disk was very, very slow. For instance, I have a web site on a virtual linux box, and it was 10x slower at least serving up images once you got past the front page. Vcenter (virtual appliance here) very slow to do anything, almost unusable.
7. Event log on each ESXi server was filled with several hundred (over 300) iSCSI disconnect/reconnect messages. Last night they started at 4:45am, and ended at about 9:31am.
8. RAID scrubbing was scheduled to run last night (each Sunday) and start at 4am.
9. I noticed RAID scrubbing still had about an hour to go this morning around 8:30. And around 9:30 the system speed is fast again.
My conclusion is that RAID scrubbing is causing the system to drop iSCSI connections as long as it is running. This may be ok for a home user who can schedule some downtime while things happen, but it's not something you'd have in a business environment. I'm pretty sure it caused some corruption in several VM's last weekend when I had the QNAPs loaded down with a few more apps running at the same time.
What can I do to lessen the impact of RAID scrubbing WHILE IT RUNS?
1. TS-451+ (two of them, each set up with a 5TB iSCSI target on interface #2, 1Gb connection.
2. Latest firmware, but also last one as I tried backrevving at one point. Firmware 4.4.3.1421
3. 16GB RAM, each 451 has four HDD's. One has 4 10TB drives, one has 4 4TB drives, both in RAID5
4. Three ESXi 6.7 servers making iSCSI connections to each TS-451+ and using datastores on each QNAP.
5. Dedicated iSCSI network for iSCSI, with two QNAPs and three ESXi connections and nothing else. I bought managed switches for both interfaces for testing to see if there was weird traffic on the subnet, but if anything there was less there than I thought. Nothing that looked like it wasn't normal traffic.
6. Symptom: In a word, dog slow. Anything coming off of those ESXi servers that needed to read some data from disk was very, very slow. For instance, I have a web site on a virtual linux box, and it was 10x slower at least serving up images once you got past the front page. Vcenter (virtual appliance here) very slow to do anything, almost unusable.
7. Event log on each ESXi server was filled with several hundred (over 300) iSCSI disconnect/reconnect messages. Last night they started at 4:45am, and ended at about 9:31am.
8. RAID scrubbing was scheduled to run last night (each Sunday) and start at 4am.
9. I noticed RAID scrubbing still had about an hour to go this morning around 8:30. And around 9:30 the system speed is fast again.
My conclusion is that RAID scrubbing is causing the system to drop iSCSI connections as long as it is running. This may be ok for a home user who can schedule some downtime while things happen, but it's not something you'd have in a business environment. I'm pretty sure it caused some corruption in several VM's last weekend when I had the QNAPs loaded down with a few more apps running at the same time.
What can I do to lessen the impact of RAID scrubbing WHILE IT RUNS?