Re: NVMe SSD and discard mount option
On Tue, 09 May 2017, David Guyot wrote:
> I recently loaned a server with NVMe SSD and saw, during my research on
> the relevance of the discard mount option for them, that its use is
> discouraged for NVMe SSDs. Why? Does NVMe SSDs not need trimming? Is it
> integrated in the NVMe driver for Linux?
Linux does better with filesystem-level TRIM every so often (how often
depends really on your write load and level of overprovisioning) as a
That said: NVMe usually dislikes frequent use of TRIM because it
typically will play badly with the many-queue scheduler inside the
device: the device will typically have to issue a device-wide write
barrier internally, which has to drain (and freeze) every [write?] queue
up to the barrier point before the barrier can be cleared. The blocks
are then marked as free for future garbage collection, and all queues
unfrozen, thus resuming operation. This hurts multi-stream streaming
performance quite a lot...
Even if it had to freeze just one queue, it would still hurt when
compared to an fstrim every hour/day/week/month.
Non-NVMe devices have far less command-path paralellism, so the
device-wide write barrier should typically hurt less (in relative terms)
than it would on a NVMe device.
Besides, I/O latency becomes *utterly unpredictable* when online discard
is active, which can cause all sort of stuttering on the default I/O
scheduler. Latency will get unpredictable during fstrim as well, but
you can schedule the fstrim to a time of your choice, instead of every
time the filesystem frees a data or metadata block...
As for flash wear, on a modern SSD (NVMe or otherwise), to keep it at
the bare minimum it should be enough to overprovision it properly and
issue an fstrim (on average) after writing about 50% of the size of
the overprovisioned area. That might even give you less flash wear than
online-discard over time...
 50% is just a hunch. You could test that, but please keep in mind
that it will be device-firmware dependent. I bet there are a few
devices that are going to priorize copying data around to free
almost-empty erase blocks for erasure, no matter how many fully-erased
blocks are already available...