Hello all,
I have been struggling with this issue for a few months now (well, it's only in the past few days that I really started looking into it). The craziest thing to me was that even with a read-only cache, I would see writes drop to 50 MB/s (when writes to the HDD RAID would be around 350 MB/s without an SSD cache).
I have had to re-think several of my assumptions. This was especially annoying because I picked the device I did specifically because of the 10 GbE support and four 2.5" slots for SSDs, so that I could enjoy quick transfers. I'll go into details below. (Running a TVS-951x with 5 x 8 TB Seagate Ironwolf drives in RAID6, encrypted, and various configurations of Samsung SSDs (mix of 830, 840 EVO, 860 EVO))
Assumption #1:
An SSD cache is always worth it. SSDs are much quicker than any HDD.
The truth:
This isn't really the case when you work with a lot of data and the SSD is "full". (Given that the cache layer is actually not very smart)
Assumption #2:
Surely a read-only cache cannot affect write speeds?
The truth: Because of the way dm-cache works (and this is just my hypothesis, I did not verify this but I have fair reason to believe this is why), all writes are also stored to the cache, so that they can provide an immediate benefit to any subsequent reads.
If writes to the cache are slow for some reason or other, writes to a RAID setup, even with a "read-only" SSD cache, will be slow.
Assumption #3: The CPU in my NAS supports AES-NI instructions,
surely there shouldn't be any speed impact from using encryption.
The truth:
Well there is, at least with the dog of a CPU that they put in the TVS-951x (Celeron 3865U dual-core 1.8 GHz processor, Kaby Lake architecture).
Let's get into it.
Any modern SSD has a what's called a Flash Translation Layer (FTL) between the controller chip and the NAND flash chips. In Flash chips, you can't rewrite individual sectors (you can write individual sectors if they haven't been used since the "page" they belong to was last erased), and sectors are combined into pages, which are the basic unit of size for flash chips. This means that the controller has to do extra work to keep track of which OS-visible ("LBA") sector was written where, and periodically coalesce them as demand for writes forces the controller to erase other pages so that more data can be written to the disk. This is called garbage collection (GC). There are some more tricks that SSD makers use, such as treating part of the TLC flash as SLC for speeding up small (up to some or some tens of gigabytes) writes.
Almost as soon as I started using my 128 GB Samsung 830 SSD for the SSD cache layer, I would see writes drop down to 50 MB/s after a measly few gigabytes of data written sequentially (yes, I chose to use the SSD cache for all i/o, because I wanted to eventually take advantage of quick writing of even big files to the NAS). I kind of always knew that it is possible that with a "full" SSD writes can be slow as the SSD performs garbage collection, but I thought that overprovisioning would help with this. Well, it really doesn't. Garbage collection is something an SSD has to do anyway, and if the GC process is slow (for example, if erasing a page takes a lot of time), it doesn't matter how much you overprovision the SSD, it's going to be slow anyway.
For some sort of evidence, refer to this graph from Anandtech's review of the 830 SSD:
https://images.anandtech.com/reviews/st ... 60mins.png. You can clearly see how in a "full" state, the write speeds drop down to a little over 50 MB/s after a few gigabytes or so. Keep in mind that Samsung is actually one of the better SSD makers in this regard (the 830 is already over 5 years old, but the problem manifests to some degree even with newer/bigger drives); if you use cheaper SSDs with crappy controllers, you're still going to see the same problem.
Yes, I tried secure erase, yes I tried overprovisioning (up to 20%), but nothing helped.
I also tried using 4 x 250 GB SSDs (one 840 EVO, three 860 EVO) in various configurations (raid5 and raid10 as a read+write cache, and also raid0 as a read-only cache). I could get the "choke" write speeds up to 250 MB/s (depending on the RAID setup for the SSDs, it would choke between 10 and 50 gigabytes written), but that still wasn't as good as writing directly to the HDDs without an SSD cache. The cache layer definitely "kind of" worked as long as it wasn't hitting this fundamental wall with SSDs: after a secure erase of the SSDs and setting them up in a raid0 readonly cache, the first read of a big file would be around 350 MB/s (limited by the HDDs+encryption, I guess), and the second read would hit almost 800 MB/s over the network.
So let's forget about writes. I do do a lot of reads as well. I would have loved to just use the SSDs as a very quick read-only cache, but it turns out that because of the way the cache layer works, it very much affects write speeds even in a read-only configuration.
I must stress that I only deduce this empirically; I have not verified that this is how dm-cache actually works or if it's configurable.
It appears that dm-cache operates under the very naive assumption that the cache layer is always faster than the backing disk(s). Because of this, writes to the disk are also routed to the cache device(s), so that they can provide an immediate benefit to any reads of the file that was just written to. Now, because of what I described above, as soon as you've written 5 or 10 or 50 gigabytes to an SSD, it will start blocking because of the GC operation.
The end result is that even with a read-only cache, writes will choke when the SSDs choke.
You'd think that dm-cache would have the smarts to bypass the cache layer as soon as it detects that the cache layer is slower than the backing disks (regardless of if it's a read or write operation), but that doesn't appear to be the case. If it did, using a cache would never be slower than reading/writing directly to the disk.
The conclusion is:
An SSD cache is just not worth it. It doesn't make a difference if it's read-only or read/write. It might be worth it if you really only use it for truly random I/O (if you run VMs off of it), or if the SSDs are truly fast (say for example some PCIe NVME SSDs - I can't test this because the TVS-951x I have doesn't seem to have a PCIe slot) or
never suffer from performance degradation (like maybe the Optane devices?). But for the general case, with SATA SSDs -
Just Don't Bother.
Now to my other annoyance: writes to an encrypted RAID seem to be limited by the CPU. This is something I did not expect. Modern CPUs can use the AES-NI accelerated instructions and achieve multiple gigabytes per second of encryption or decryption. There's of course some overhead arising from the fact that an encryption mode such as CBC (in the case of QNAP) or XTS has to be used on top of that, but it shouldn't be too bad.
Well, it is. I don't know if there's some inefficiency in how dm-crypt works or if it's because the CPU also has to do raid6 parity calculations (when writing) at the same time, but my CPU is pegged at 99% (per the dashboard) when writing to my NAS.
This is something I should have noticed though:
https://www.qnap.com/en/product/tvs-951x the part there in orange color quite clearly shows that with an encrypted disk, you can't expect much more than around 350 MB/s for writes. Without it you should at least get around 700 MB/s (assuming the HDDs aren't the bottleneck), and I did in fact try 4x250GB SSDs in a raid0 (without encryption) and got almost 800 MB/s in writes. But of course that isn't usable because you don't get any of the safety that you want.
Am I dissatisfied? Not really. I should have checked the specifications more carefully, or chosen not to use encryption (I have stored the encryption key in the NAS so that it unlocks automatically - this basically defeats the purpose of the encryption anyway! ... but even if I deleted that, I would still have the password and the key file stored on my PC, which isn't encrypted). On the other hand, for me the NAS is just for storing files - 350 MB/s isn't bad at all compared to the bunch of spinning disks I had in my PC (which would only do 100-150 MB/s) previously! If I were to upgrade my NAS, I would look for a device that has a beefier CPU, but for now this'll do, I guess.
So I guess my takeaways are:
Manage your expectations. Do not assume that something is, because you think it is. If something is measurably worse, then it is, and there's probably a reason why. If you care enough, you can figure out why.
You live and you learn.
Addendum 9/17: It appears that the slow writes to an encrypted volume are due to the fact that QNAP uses aes-256-cbc for encrypted volumes. Using aes-256-xts would be a much better choice (openssl speed benchmark indicates ~450 MB/s for aes-256-cbc encryption, ~1800 MB/s for aes-256-xts encryption on my device - the difference from the CBC benchmark to the 350 MB/s write speed limit I am hitting is reasonable due to parity calculation and other overhead). I'll make a separate post about this and see if I can find some way to force the use of XTS instead.