Performance is excellent, but ever so often (once every two hours) the device will become completely unresponsive: iSCSI traffic stalls and the NAS can no longer be reached over SSH (not possible to log in) or web interface. After three to four minutes the device becomes available again. As soon as I'm able to log in via SSH again it reveals a 5 minute load average of 15, while the normal load average hovers around 3. So it was definitely busy with something during the stall.
Today I set up some terminals tailing /var/log/*log and top hoping to see what process causes this behaviour. I set up an I/O load test from a VM and once that stalled I started watching the logs and top output, but there was nothing there. I did notice the dmesg raining with this repeated message:
Code: Select all
[ 6709.871826] dm-kcopyd track job out of array
[ 6709.871827] dm-kcopyd track job out of array
[ 6709.871828] dm-kcopyd track job out of array
[ 6709.871829] dm-kcopyd track job out of array
The dm-kcopyd messages are still scrolling by as I write this, 15 minutes after the incident. After a reboot they're gone.
Additional details:
- Model name: TS-1263-U RP
- Firmware version and Build Number: 4.2.1 build 20160221
- Operation System (OS): ESXi 6.0U2
- Services enabled: iSCSI
- External devices: none
- NAS connection speed/ MTU: 10GbE / 9000