[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Creation of empty files [was: Useful use of dd]



On Sat, Jul 03, 2021 at 02:34:56PM -0700, David Christensen wrote:
On 7/3/21 6:44 AM, Michael Stone wrote:
On Fri, Jul 02, 2021 at 02:30:50PM -0700, David Christensen wrote:
2021-07-02 14:24:30 dpchrist@dipsy ~/sandbox/dd
$ du --bytes truncate-sparse
5242880    truncate-sparse


I expected sparse files, but du(1) does not indicate such (?).

You used --bytes, which per the man page implies --apparent-size


RTFM du(1):

     -b, --bytes
             equivalent to '--apparent-size --block-size=1'


2021-07-03 14:12:30 dpchrist@dipsy ~/sandbox/dd
$ du --block-size=1 [a-z]*
0	dd-sparse
0	truncate-sparse
5242880	urandom
5242880	zero


Thank you.  :-)


I suppose '-b' is Huffman Coding [1] for somebody's use case (?).


I prefer the Principle of Least Surprise [2] -- 'du' means "disk usage".

In the case that a file doesn't contain an even multiple of the filesystem block size, du will essentially round up the size of every file. If for some reason you want to know the cumulative lengths of a set of files rather than the number of blocks used, maybe to put the files into a serialized form for network copy or somesuch, du -b will do that thing. Although there may have been a more obsolete reason it was added.[1]

It was like that in the very first version of debian (via fileutils) then broken in later versions of fileutils, then fixed in coreutils to revert to the original behavior. The breakage came when du (and other utilities) gained the ability to use arbitrary block sizes; du -b was changed to mean "report blocks used, but display in units of 1 byte blocks". The problem is that the original meaning can only be achieved by looking at the file size rather than file blocks, while the new meaning looked at blocks and then converted into bytes. Different things for any file that isn't an even multiple of the block size, and very different for sparse files. When this was noticed it was put back the way it was, --apparent-size was added, and the man page was clarified.

The distinction was intuitive when du (and df!) only reported in blocks and is much more esoteric now that people usually use megabytes or such. Anyway, when you're looking at file size (vs blocks) you can't represent sparseness at the same time. The man page has gotten better over time to try to make it more obvious that a couple of things are going on when you use -b, but nobody wants a wall of text like this in a man page.


[1] I suspect this all has to do with compatability with something ancient (posix docs refer to -b being considered then dropped) along the lines of "instead of arguing about what size blocks to report, just use bytes". (Yes, there was a block size war.) It tried to solve a real problem (though it wasn't widely adopted probably because it created a new problem) but the thing it was trying to solve became moot after nobody at all wanted to talk in terms of blocks. That was about the time that people started using megabytes as a common unit for file sizes and we could no longer get away with a 32 bit int for off_t on larger systems. Prior to that it was fairly common to think in terms of blocks. (E.g., it was how file sizes were displayed on vms.)


Reply to: