[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Question: SSD speed



>>> 	a container nears being full.  If one has 1 MB of storage available
>>> 	(allowing for file system overhead and block alignment), then 1 MB
>>> 	of data will fit, but 1 MiB will not.
>> In which way is the KB-vs-KiB discrepancy different from the "file
>> system overhead and block alignment"?
> 	File system overhead is not primarily an artifact of the size of the
> 	blocks.  Block alignment is related to the size of the blocks, but
> 	then I never said it was not.

The question is not about underlying technical details, but about the
result when it look at it as a "black box".

>> Maybe it's a higher percentage, but the end result is the same: if your
>> FS says you have N bytes left, it does not guarantee you that an N byte
>> file will fit,
>
> 	If it says you have 1.00 MB free, then it is guaranteed 1.00 MiB
> 	will not fit.  If it (accurately) says you have 1.00 MiB free, then
> 	1.00 MB is indeed guaranteed to fit.

[ Actually, not necessarily.  I can imagine filesystems where that could
  fail (either because they include an insane amount of metadata, or
  because they only support files made of consecutive blocks and there
  is no hole large enough to fit that 1.00MB file).
  But, yes, in practice the 5% slack of MiB-vs-MB should always be
  larger than the filesystem's overhead.  ]

>> so you need to include some slack.
> 	Not usually that much.  Again, it depends on the situation.

Yes, it depends.  Hence the need for guessing and approximation.

>> And since very few
>> people are able to quickly compute how many blocks are needed for
>> a file of size N on the specific filesystem they use, you're better off
>> using a "safe enough" estimate.  This notion of "safe" enough is one
>> learned empirically over the years, so whether it's 2% or 10% doesn't
>> really matter that much, as long as it's pretty much always the same.
>
> 	Well, the point is it isn't the same.  One PiB is
> 	1,125,899,906,842,624 bytes.  One PB is 1,000,000,000,000,000
> 	bytes.  That is a difference of 12.6%, vs. 2.4% difference
> 	between 1 KiB and 1 KB.

That's true.  But during a given year, most people will only be faced
with sizes that are within more or less the same range.  E.g. most
people nowadays are commonly faced with the 7.3% difference of
GiB-vs-GB so it's likely the slack they will implicitly assume.
Every ten years or so you'll need to adjust this slack to your new
reality manipulating larger data, but since it happens gradually it's
not that big of a deal.

>>> As an engineer, precision is absolutely of the essence.
>> When talking about the capacity of mass storage, you don't need
>> precision, since it just has to be large enough, rather than having to
>> have exactly the right size.
> 	Indubitably, but a 12.6% shortfall would be a problem.

I'm pretty sure people do use PB-size datasets nowadays and I haven't
them complain about PiB-vs-PB problems, so I'm not worried.

> 	Of course I am.  There are plenty of cases where a safety margin is
> 	required.  There are plenty of other cases where a tiny differential
> 	can mean the difference between working well and a catastrophic
> 	failure. Direction also matters.  A cylinder that is just 0.001 mm
> 	oversize will not fit in a hole that is 0.001 mm undersize.

Indeed, but mass storage isn't in that category.  When you buy a drive
of size S you know you definitely won't be able to fit a dataset of size
S onto it but it's not straightforward to know exactly how large
a dataset will fit.  In order to know the actual space you'll have
available you need to account for many overheads and approximations, and
in 99% of the cases you also need to account for the fact that the size
of the data you'll put on it isn't precisely known yet either
(especially since if it's large there's a very good chance that it'll be
compressed to some extent).

So except for very unusual cases, this is firmly in the camp of "guess
the approximate size you need, then multiply by N" or something along
the lines.


        Stefan


Reply to: