TS-673A keeps failing

Questions about SNMP, Power, System, Logs, disk, & RAID.
Post Reply
noodles_gb
Getting the hang of things
Posts: 59
Joined: Sun Apr 10, 2016 6:00 pm

TS-673A keeps failing

Post by noodles_gb »

Firmware: QuTS Hero h4.5.3.1698

I have 3x ZFS storage pools:
1 - 2x Crucial MX100 512GB SATA SSD for OS/apps RAID1
2 - 4x Seagate IronWolf 10TB SATA HDDs
3 - 2x PCIe NVMe (Samsung PM983 Enterprise) in RAID1 on QNAP PCIe card using 2x NVMe drives on the motherboard for read cache.
Connected via X520 10 Gbps PCIe NIC

I use the NAS as iSCSI storage from pool 2 & 3 and whenever the pools get busy, the whole NAS stops responding via network and new SSH sessions fail to connect. The only workaround is to remove power from the NAS.
If is hosting several VMs over iSCSI so it is catastrophic when this happens.

All HDDs/SSDs showing healthy.

If I'm already logged in via web or ssh the session remains open but it won't accept new connections. Via the web I can see all pools are healthy as are all drives. However I can't do much, I can't install apps, move/copy files, I can't access files over the network.

This is clearly a massive issue for me on what is a very expensive doorstop at the moment.

Any ideas where to start looking, which log files etc? I've raised a support ticket but I don't think I'll get an answer fast.
noodles_gb
Getting the hang of things
Posts: 59
Joined: Sun Apr 10, 2016 6:00 pm

Re: TS-673A keeps failing

Post by noodles_gb »

I've disabled dedupe and compression on the iSCSI LUNs which so far (touches wood) seems to have helped. I'm still however seeing pretty poor performance compared to my old TVS-671 running EXT4.
noodles_gb
Getting the hang of things
Posts: 59
Joined: Sun Apr 10, 2016 6:00 pm

Re: TS-673A keeps failing

Post by noodles_gb »

The disk performance on my LUNs are terrible. :S

Code: Select all

[~] # qcli_storage -t force=1
fio test command for File system: /sbin/fio --filename=test_device/qcli_storage --direct=0 --rw=read --bs=1M --runtime=15 --name=test-read --ioengine=libaio --iodepth=32 --size=128m &>/tmp/qcli_storage.log
Start testing!
Performance test is finished 100.000%...
VolID   SharedFolderName    Pool     Mapping_Name            Mount_Path                    FS_Throughput
2       Public              1        zpool1/zfs2             /share/ZFS2_DATA              1.16 GB/s 
3       homes               1        zpool1/zfs18            /share/ZFS18_DATA             1.28 GB/s 
4       Media               2        zpool2/zfs19            /share/ZFS19_DATA             3.68 GB/s 
5       DML                 2        zpool2/zfs20            /share/ZFS20_DATA             3.21 GB/s 
6       Quickbox            2        zpool2/zfs21            /share/ZFS21_DATA             3.91 GB/s 
7       Backups             2        zpool2/zfs22            /share/ZFS22_DATA             3.68 GB/s 
8       R5                  2        zpool2/zfs274           --                            132.67 MB/s
9       NVMe_R1             3        zpool3/zfs275           --                            544.96 MB/s
The two LUNS are R5 and NVMe_R1.
R5 is 4x10 TB drives (no cache) in RAID 5.
NVMe_R1 is two PCIe NVMe drives in RAID 1.

Performance from the disks themselves however (plus Media, DML, Quickbox which are all shared folders) is fine.

Code: Select all

[~] # qcli_storage -T
fio test command for physical disk: /sbin/fio --filename=test_device --direct=1 --rw=read --bs=1M --runtime=15 --name=test-read --ioengine=libaio --iodepth=32 &>/tmp/qcli_storage.log
Start testing!
Performance test is finished 100.000%...
Enclosure  Port  Sys_Name          Throughput    Type      Size      Alias             Signature   Partitions  Model  
NAS_HOST   1     /dev/nvme2n1      578.78 MB/s   SSD:free  953.87 GB M.2 SSD 1         QNAP        6           NVMe Sabrent        
NAS_HOST   2     /dev/nvme3n1      579.12 MB/s   SSD:free  953.87 GB M.2 SSD 2         QNAP        6           NVMe Sabrent        
NAS_HOST   3     /dev/sdd          496.32 MB/s   SSD:data  476.94 GB Disk 1            QNAP FLEX   5           Crucial CT512MX100SSD1
NAS_HOST   4     /dev/sdc          465.97 MB/s   SSD:data  476.94 GB Disk 2            QNAP FLEX   5           Crucial CT512MX100SSD1
NAS_HOST   5     /dev/sdb          185.23 MB/s   HDD:data  9.10 TB   Disk 3            QNAP FLEX   5           Seagate ST10000VN0004-1ZD101
NAS_HOST   6     /dev/sda          181.05 MB/s   HDD:data  9.10 TB   Disk 4            QNAP FLEX   5           Seagate ST10000VN0008-2PJ103
NAS_HOST   7     /dev/sde          176.90 MB/s   HDD:data  9.10 TB   Disk 5            QNAP FLEX   5           Seagate ST10000VN0008-2JJ101
NAS_HOST   8     /dev/sdf          197.69 MB/s   HDD:data  9.10 TB   Disk 6            QNAP FLEX   5           Seagate ST10000VN0008-2JJ101
NAS_HOST   P2-1  /dev/nvme1n1      2.81 GB/s     SSD:data  894.25 GB PCIe 2 M.2 SSD 1  QNAP FLEX   5           SAMSUNG MZ1LB960HAJQ-00007
NAS_HOST   P2-2  /dev/nvme0n1      2.81 GB/s     SSD:data  894.25 GB PCIe 2 M.2 SSD 2  QNAP FLEX   5           SAMSUNG MZ1LB960HAJQ-00007
Why are the iSCSI LUNs so terrible?
kobe
Starting out
Posts: 33
Joined: Wed Jan 27, 2010 8:46 pm

Re: TS-673A keeps failing

Post by kobe »

Interesting. I contacted QNAP on a what seems to be a similar issue. I've got a TVS-663.

In my case the QNAP stops working and crashes with a unhealthy reboot whenever I connect to the iscsi storage from another box (INTEL NUC w/Fedora linux server) and leaving one or sometimes more file systems dirty. This happens with 4.5.4.1741. I reverted back to firmware 4.5.4.1723, but I have the CHAP initiating failure thing happening, as discussed here viewtopic.php?f=25&t=161990 and here viewtopic.php?f=180&t=161992. But I can live without CHAP for a while. Might even revert back double time.

Theory: QNAP messed up iscsi CHAP login, tried to fix it and made it worse. QNAP suggested I should try the original RAM as I run on non-QNAP RAM, but that worked for over 4 years w/o any problems, so I doubt it'll make any difference.
noodles_gb
Getting the hang of things
Posts: 59
Joined: Sun Apr 10, 2016 6:00 pm

Re: TS-673A keeps failing

Post by noodles_gb »

Thanks. I don't use CHAP as it's a home environment. I have upgraded my RAM as well using Crucial sticks, if QNAP tell me it's bad RAM I will laugh and send the thing back to the retailer as not fit for purpose.

I can almost guarantee that if I got a GPU and installed TrueNAS onto it I wouldn't have any issues.

QNAP have 'escalated' my ticket and I'm waiting to hear back.
kobe
Starting out
Posts: 33
Joined: Wed Jan 27, 2010 8:46 pm

Re: TS-673A keeps failing

Post by kobe »

QNAP told me that the "new" Kernel was more strict / picky about RAM, but would not tell me what supposedly changed between the last two firmware releases (I specifically asked QNAP as I doubt that the kernel did change a lot).

In any case, they refuse to forward my request/problem unless I have swapped out my current RAM to the original RAM. But that is too much hassle. I decided to stay put and will decide next year on a new NAS ... definitely not QNAP any more ... probably build my own.
noodles_gb
Getting the hang of things
Posts: 59
Joined: Sun Apr 10, 2016 6:00 pm

Re: TS-673A keeps failing

Post by noodles_gb »

QNAP support have identified call trace errors in the logs and say it will be fixed in a future update.
Post Reply

Return to “System & Disk Volume Management”