md5sum <FILE produces spurious ` -' in output
This bug has been sitting on our todo list for some time, mainly
because I've been too slack. My apologies. As promised, I'm now
picking it up again.
I've gone and reread the bug reports #164591 and #164889, of which I'm
the submitter of the latter, and I've written a summary of my
position, below. Just to clarify: I would like the committee to
overrule the maintainer.
I have BCC'd this message to both bug reports, to 164591's submitter,
and to dpkg@packages, to make sure everyone knows about this
discussion. But the discussion should probably continue on the
debian-ctte list, and not get crossposted to the bug and the package
maintainer hat.
On to the substance:
* The question is, what should md5sum < filename do ?
Using /dev/null as an example, the two behaviours are:
Bare: -davenant:~> md5sum </dev/null
(IMO good) d41d8cd98f00b204e9800998ecf8427e
-davenant:~>
Annotated: -anarres:~> md5sum </dev/null
(IMO bad) d41d8cd98f00b204e9800998ecf8427e -
-anarres:~>
* Note that there is no suggestion that the output of
md5sum filename should change. It is essential that that still
produce the `Annotated' form, eg in this case:
With filename: davenant:~> md5sum /dev/null
(Good) d41d8cd98f00b204e9800998ecf8427e /dev/null
davenant:~>
This is because that output format is used as input to `md5sum -c'.
* I claim that the annotated behaviour is inferior, for two reasons:
Firstly, it is less convenient. When md5sum is used in scripts and
the like, it is significantly easier to use if a script can get it
not to annotate the output for a single file on stdin, but just
produces the bare checksum (in hex, with a trailing newline, of
course).
Otherwise callers which want the unvarnished md5sum have to use
seddery to strip the spurious ` -'. While the advantage for any
individual caller is small, the extra complexity and risk of bugs
is avoidable, and of course there are many callers of md5sum - both
actual Debian packages, and in the rest of the world - so the pain
is multiplied.
Secondly, it is not compatible with many existing programs. Even
though many things in Debian have already had extra code added to
cope, programs have been using and relying on the historical
behaviour for some time, and breaking them is a bad idea.
I also contend that the upstream coreutils md5sum should be changed
to match the this desirable behaviour, although that's not really a
question for the Debian TC.
* Opponents of my suggestion claim that the annotated behaviour is
superior because of some need to be `compatible' with coreutils
md5sum.
This is a red herring. The only case where I'm suggesting changing
the behaviour is when the filename is _not_ supplied by the
caller. Ie, when you say
md5sum < filename
rather than
md5sum first-file [second-file third-file ...]
It is true that my proposed behaviour in the case of
md5sum < filename
is different. It produces eg
d41d8cd98f00b204e9800998ecf8427e
instead of
d41d8cd98f00b204e9800998ecf8427e -
But the former is very nearly strictly superior. The latter output
is pretty much useless as input to `md5sum -c' precisely because it
also doesn't include the filename.
It is possible that a small proportion of the programs which were
changed to accept the new output by stripping the ` -' were
changed so that they would no longer accept the old unvarnished
form - but such programs will be rare because that makes them
incompatible with the old behaviour of dpkg's md5sum.
* Opponents of my suggestion have also claimed that it is not
appropriate for the Debian maintainer to make this change and that
instead I should get upstream coreutils to make the change first.
I agree that it would be good for coreutils to change. But, as a
Debian maintainer I have to write programs which are compatible
with the md5sum shipped in Debian. If Debian's md5sum's behaviour
is not restored then I will have to change my packages so that they
cope with the new broken behaviour. This is quite different to the
upstream coreutils, where I can blow off bug reports saying `the
GNU people broke it - get them to fix it again'.
Debian has never shied away from making technically correct changes
even if the face of opposition from upstream, and we should not do
so now - when there is little evidence of any opposition from
upstream. I would of course encourage the Debian coreutils
maintainer to talk to the GNU maintainer to try to rationalise the
situation, but in the Debian project it's primarily the Debian
package maintainer's responsibility to do any necessary
communication with upstream. I can't reasonably demand that a
volunteer Debian maintainer actually do that work, but I don't
think that their lack of time to do so is a good excuse for not
fixing the bug in Debian.
The remaining argument for waiting for the reversion to be accepted
upstream is that we should be `compatible' with GNU coreutils so
that other 3rd-party programs will work well on Debian. However,
the change does not introduce any significant incompatibility:
programs outside Debian which feed stdin to md5sum already have to
work with the GNU version, old PGP versions, etc., some of which
include the spurious ` -' and some of which don't. Programs which
_fail_ when they cannot strip the spurious ` -' because it's
missing will be very rare and easy to fix.
* There has been some suggestion that there is a need to trawl
through packages looking for ones which will break.
As discussed above, the compatibility problems are nearly
nonexistent. Very probably nothing will break. There is a small
chance that there is some program was unwisely modified to insist
on stripping ` -' and which now fails if it's not present. Any
such program should be fixed anyway to enhance its compatibility
with non-GNU versions of md5sum including those from older Debian
versions.
* Historical context:
Debian has used an md5sum in the dpkg package. This md5sum came
originally from PGP2.x (circa 1992/1993), and was originally
written by Colin Plumb. It produced the bare checksum when the
filename wasn't supplied. (It also provided `md5sum -c' and
produced the corresponding output format for when the filenames
were supplied.)
Some time in the last few years, GNU textutils gained a version of
md5sum. This md5sum has slightly different behaviours - it
interprets unexpected input slightly differently for md5sum -c, and
it also produces the annotated output in the case at issue.
As I recall (but I could be wrong) the dpkg md5sum was, when
textutils gained its own md5sum, briefly retired in favour of the
textutils one. However, the dpkg one was quickly restored, mainly
because of the behavioural differences, including the annotation
when taking input from stdin.
AIUI, most recently, a version of dpkg was been uploaded whose
md5sum has been modified to produce the annotated output.
Thanks,
Ian.
Reply to: