[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: SDD partitioning and allocations



Hi,

On Thu, Jul 10, 2025 at 07:07:03AM -0400, songbird wrote:
> in previous years I recall that there was some recommendation to leave
> some part of the SSD unallocated and not formatted as part of a file
> system so any parts that failed as bad blocks or wore out could be
> allocated from these unused areas.

The purpose of this was not to reserve space for "bad" blocks as such¹
but to increase the amount of available space for wear levelling. In
that a particular flash device has a rating for the volume of writes it
could expect to endure in its lifetime so if you only ever used half the
capacity of the device then you could expect that volume to be roughly
doubled.

(It won't be exactly like that because each design will have a varying
amount of spare capacity hidden from the user for this purpose anyway.)

This was common advice years ago when flash endurance was relatively low
and incidences of people wearing out their SSDs were commonplace.

>   When trying to see what current recommendations are for setting
> up SSDs I see no mentions of this at all?  Has this changed?

Today's SSDs, even consumer brands, have much higher endurance, and this
sort of advice is quite complicated and consumer-hostile, so you don't
see it any more.

Just don't worry about it unless you have an unusually heavy write load.
If you do then make sure to take a look at the published endurance
figures of the particular drive. It will be quoted either in "Terabytes
written" (TBW) in its lifetime, or "Drive writes per day" (DWPD) e.g. if
the drive is 1TB and it is rated for 0.5 DWPD over three years then that
is about 0.5 * 1TB * 365 * 3 = ~548 TBW.

You can get figures on how much you're written to flash using SMART or
nvme-cli, often even from conventional HDDs these days. You can use
blktrace to measure it on an ongoing realtime basis.

For example this is one of the Samsung 850 EVO SSDs that is in my
desktop computer, which I use almost every day:

$ sudo smartctl -j -A /dev/sda | \
    jq '.ata_smart_attributes.table[] |
        select(.name=="Power_On_Hours"
            or .name=="Wear_Leveling_Count"
            or .name=="Total_LBAs_Written") |
        .name, .value, .raw.value'
"Power_On_Hours"
86
66088
"Wear_Leveling_Count"
91
176
"Total_LBAs_Written"
99
120847986783
$ units '660878 hours' 'years'
        * 7.5392895

So this drive has been powered on for over 7½ years, still is at 91%
write endurance remaining. An LBA on this drive is 512 bytes so it's
written…

$ echo "scale=3; 120847986783 * 512 / 1024 / 1024 / 1024 / 1024" | bc
56.274

…TiB.

(Just do "smartcl -A /dev/blah" to see all the attributes without the
JSON output I used just to make it presentable in this email.)

>   Pretty much my current plan for one of the SSDs would be
> to put an efi small partition(as I notice the current ones I have
> hardly have anything on them even if they were allocated to be 1G)
> so that I can copy my current setup to that but not waste the
> space).  The existing ones use 5M or even much less so perhaps 50M
> will be enough allowing for future expansion?

The recommended size of an EFI SYstem Partition (ESP) is up for debate
and is not related to what kind of drive you put it on:

    https://wiki.debian.org/UEFI#EFI_System_Partition_.28ESP.29_recommended_size

> The rest of the new drive will just be one large partition.

RAID is worth it so as not to have to stop working to reinstall from
backups.

>   The 2nd new SSD will be for consolidating my backups (that are on
> a smaller SSD at the moment plus also on an external drive that is
> not used frequently - I don't trust it as it has been knocked off
> the table but until it gives up entirely it is a backup that can't
> be messed with as it is not mounted or powered on often).

SSDs have no moving parts so withstand sudden impacts a lot better than
HDDs do. It's probably fine.

>   I don't use the discard options on the mounts or filesystems
> and also don't run fstrim automatically, I will eventually set
> this up to run monthly.

fstrim runs by default on all Debian installs for years so you must have
gone out of your way to disable this. Why?

$ systemctl status fstrim.timer
● fstrim.timer - Discard unused blocks once a week
     Loaded: loaded (/lib/systemd/system/fstrim.timer; enabled; preset: enabled)
     Active: active (waiting) since Fri 2025-06-13 00:38:26 BST; 3 weeks 6 days ago
    Trigger: Mon 2025-07-14 00:22:50 BST; 3 days left
   Triggers: ● fstrim.service
       Docs: man:fstrim

Thanks,
Andy

¹ It is perhaps slightly philosophical whether a memory cell is "bad"
  because it has worn out through doing an amount of writes that it was
  expected to do. In practice you can't write to it any more, but I
  don't consider that a fault. Without more context, if someone calls a
  cell bad then I think of it as becoming unexpectedly faulty.

-- 
https://bitfolk.com/ -- No-nonsense VPS hosting


Reply to: