A little update:
The QNAP support told me they escalated my issue and will get back in touch with me once they heard back from the technicall staff.
What really makes me scratch my head quite hard is the behaviour of the storage (6x HDDs) configured as RAID5.
At some point the data transfer and rise of cache just stops and the fileshare becomes unresponsive (can't browse any files with explorer, can't access the share restarting the explorer), however the QNAP itself doesn't crash.
I can still access the files using File Station but trying to delete them just hangs the application. All I can do is reboot the QNAP.
Crash when writing more than 200GB in one batch
-
- Starting out
- Posts: 17
- Joined: Thu Nov 29, 2018 5:44 pm
Re: Crash when writing more than 200GB in one batch
You do not have the required permissions to view the files attached to this post.
- storageman
- Ask me anything
- Posts: 5507
- Joined: Thu Sep 22, 2011 10:57 pm
Re: Crash when writing more than 200GB in one batch
This is off when it works without crashing, right? Or did your SSH do something more.
Strange we don't hear other people with this issue on that model.
Strange we don't hear other people with this issue on that model.
You do not have the required permissions to view the files attached to this post.
-
- Starting out
- Posts: 17
- Joined: Thu Nov 29, 2018 5:44 pm
Re: Crash when writing more than 200GB in one batch
This setting has been off for all my tests. Seeing the behaviour of the problem it all looks like a software bug to me. However I would imagine that using a RAID5 with all 6 HDDs in this box would be quite common and others should have this problem as well. Keen to see what the QNAP Support has to say.storageman wrote: ↑Fri Dec 07, 2018 5:07 pm This is off when it works without crashing, right? Or did your SSH do something more.
Strange we don't hear other people with this issue on that model.
temp.png
- storageman
- Ask me anything
- Posts: 5507
- Joined: Thu Sep 22, 2011 10:57 pm
Re: Crash when writing more than 200GB in one batch
How about creating a volume on those SSDs and doing a 200GB+ copy?
Personally, I think those Reds are part of the problem.
Personally, I think those Reds are part of the problem.
-
- Starting out
- Posts: 17
- Joined: Thu Nov 29, 2018 5:44 pm
Re: Crash when writing more than 200GB in one batch
I tried the same thing using the NVMe SSDs in RAID1, same problem though. We also have another QNAP with 12 WD RED in a RAID6 for the same purpose and that is working without any issues.storageman wrote: ↑Fri Dec 07, 2018 6:26 pm How about creating a volume on those SSDs and doing a 200GB+ copy?
Personally, I think those Reds are part of the problem.
-
- Starting out
- Posts: 17
- Joined: Thu Nov 29, 2018 5:44 pm
Re: Crash when writing more than 200GB in one batch
TL;DR at the bottom.
Another update:
Three different QNAP engineers tried their luck meanwhile with a one week remote session but nothing really happened.
Yesterday they contacted me once again and asked for a phone call at 10am. Sadly noone ever called and I got no reply to my mail asking what is going on.
Yesterday evening I invested another two hours to build upon the drop_caches workaround and see if I could automate it.
First of all you don't have to use
as option,
is working just as well (only freeing pagecache).
Some websites suggest running a cronjob every five minutes, however I did not like that solution at all because most of the time you don't need it at all and sometimes the five minute timespan is just too long still resulting in a crash.
Looking into the virtual memory subsystem I found the following setting:
which pretty much explaines the behaviour I observed. Writing about 700mb/s the pagecache just jumps over this limitation right into non-existend RAM addresses.
TL;DR:
Increasing the min_free_kbytes value pretty much did the trick for me without using cronjobs or other dirty hacks. I added the following line to my autorun.sh:
Surely a lower value will also work but for now I think 32GB of cache should be just enough. Having the system cache enabled or disabled both works.
Going to run some longer tests now to see if the system stays stable for a productive use.
Another update:
Three different QNAP engineers tried their luck meanwhile with a one week remote session but nothing really happened.
Yesterday they contacted me once again and asked for a phone call at 10am. Sadly noone ever called and I got no reply to my mail asking what is going on.
Yesterday evening I invested another two hours to build upon the drop_caches workaround and see if I could automate it.
First of all you don't have to use
Code: Select all
echo 3 > /proc/sys/vm/drop_caches
Code: Select all
echo 1 > /proc/sys/vm/drop_caches
Some websites suggest running a cronjob every five minutes, however I did not like that solution at all because most of the time you don't need it at all and sometimes the five minute timespan is just too long still resulting in a crash.
Looking into the virtual memory subsystem I found the following setting:
Code: Select all
vm.min_free_kbytes = 131072
TL;DR:
Increasing the min_free_kbytes value pretty much did the trick for me without using cronjobs or other dirty hacks. I added the following line to my autorun.sh:
Code: Select all
sysctl -w vm.min_free_kbytes=33554432
You do not have the required permissions to view the files attached to this post.
- storageman
- Ask me anything
- Posts: 5507
- Joined: Thu Sep 22, 2011 10:57 pm
Re: Crash when writing more than 200GB in one batch
Nobody should need to do any of this, Qnap need to fix either hardware or software issue here.mr-auh wrote: ↑Thu Feb 21, 2019 3:41 pm TL;DR at the bottom.
Another update:
Three different QNAP engineers tried their luck meanwhile with a one week remote session but nothing really happened.
Yesterday they contacted me once again and asked for a phone call at 10am. Sadly noone ever called and I got no reply to my mail asking what is going on.
Yesterday evening I invested another two hours to build upon the drop_caches workaround and see if I could automate it.
First of all you don't have to useas option,Code: Select all
echo 3 > /proc/sys/vm/drop_caches
is working just as well (only freeing pagecache).Code: Select all
echo 1 > /proc/sys/vm/drop_caches
Some websites suggest running a cronjob every five minutes, however I did not like that solution at all because most of the time you don't need it at all and sometimes the five minute timespan is just too long still resulting in a crash.
Looking into the virtual memory subsystem I found the following setting:which pretty much explaines the behaviour I observed. Writing about 700mb/s the pagecache just jumps over this limitation right into non-existend RAM addresses.Code: Select all
vm.min_free_kbytes = 131072
TL;DR:
Increasing the min_free_kbytes value pretty much did the trick for me without using cronjobs or other dirty hacks. I added the following line to my autorun.sh:Surely a lower value will also work but for now I think 32GB of cache should be just enough. Having the system cache enabled or disabled both works.Code: Select all
sysctl -w vm.min_free_kbytes=33554432
2019-02-21_08_02_30.png
Going to run some longer tests now to see if the system stays stable for a productive use.