Performance is excellent, but ever so often (once every two hours) the device will become completely unresponsive: iSCSI traffic stalls and the NAS can no longer be reached over SSH (not possible to log in) or web interface. After three to four minutes the device becomes available again. As soon as I'm able to log in via SSH again it reveals a 5 minute load average of 15, while the normal load average hovers around 3. So it was definitely busy with something during the stall.
Today I set up some terminals tailing /var/log/*log and top hoping to see what process causes this behaviour. I set up an I/O load test from a VM and once that stalled I started watching the logs and top output, but there was nothing there. I did notice the dmesg raining with this repeated message:
Code: Select all
[ 6709.871826] dm-kcopyd track job out of array
[ 6709.871827] dm-kcopyd track job out of array
[ 6709.871828] dm-kcopyd track job out of array
[ 6709.871829] dm-kcopyd track job out of array
Literally thousands of messages per second as you can tell from the timestamps. This seems to be related to a kernel function used by dm-mirror. There was nothing showing up in the logs during the downtime and afterwards everything continued as normal. Attached is the output from tail -f /var/log/*log from the point the array became responsive again.
The dm-kcopyd messages are still scrolling by as I write this, 15 minutes after the incident. After a reboot they're gone.
- Model name: TS-1263-U RP
- Firmware version and Build Number: 4.2.1 build 20160221
- Operation System (OS): ESXi 6.0U2
- Services enabled: iSCSI
- External devices: none
- NAS connection speed/ MTU: 10GbE / 9000