[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

md5sum <FILE produces spurious ` -' in output



This bug has been sitting on our todo list for some time, mainly
because I've been too slack.  My apologies.  As promised, I'm now
picking it up again.

I've gone and reread the bug reports #164591 and #164889, of which I'm
the submitter of the latter, and I've written a summary of my
position, below.  Just to clarify: I would like the committee to
overrule the maintainer.

I have BCC'd this message to both bug reports, to 164591's submitter,
and to dpkg@packages, to make sure everyone knows about this
discussion.  But the discussion should probably continue on the
debian-ctte list, and not get crossposted to the bug and the package
maintainer hat.

On to the substance:

 * The question is, what should  md5sum < filename  do ?
   Using /dev/null as an example, the two behaviours are:

   Bare:         -davenant:~> md5sum </dev/null
   (IMO good)	 d41d8cd98f00b204e9800998ecf8427e
   		 -davenant:~>

   Annotated:    -anarres:~> md5sum </dev/null
   (IMO bad)     d41d8cd98f00b204e9800998ecf8427e  -
		 -anarres:~>

 * Note that there is no suggestion that the output of
   md5sum filename  should change.  It is essential that that still
   produce the `Annotated' form, eg in this case:

   With filename:  davenant:~> md5sum /dev/null
   (Good)          d41d8cd98f00b204e9800998ecf8427e  /dev/null
	           davenant:~>

   This is because that output format is used as input to `md5sum -c'.

 * I claim that the annotated behaviour is inferior, for two reasons:

   Firstly, it is less convenient.  When md5sum is used in scripts and
   the like, it is significantly easier to use if a script can get it
   not to annotate the output for a single file on stdin, but just
   produces the bare checksum (in hex, with a trailing newline, of
   course).

   Otherwise callers which want the unvarnished md5sum have to use
   seddery to strip the spurious `  -'.  While the advantage for any
   individual caller is small, the extra complexity and risk of bugs
   is avoidable, and of course there are many callers of md5sum - both
   actual Debian packages, and in the rest of the world - so the pain
   is multiplied.

   Secondly, it is not compatible with many existing programs.  Even
   though many things in Debian have already had extra code added to
   cope, programs have been using and relying on the historical
   behaviour for some time, and breaking them is a bad idea.

   I also contend that the upstream coreutils md5sum should be changed
   to match the this desirable behaviour, although that's not really a
   question for the Debian TC.

 * Opponents of my suggestion claim that the annotated behaviour is
   superior because of some need to be `compatible' with coreutils
   md5sum.

   This is a red herring.  The only case where I'm suggesting changing
   the behaviour is when the filename is _not_ supplied by the
   caller.  Ie, when you say
      md5sum < filename
   rather than
      md5sum first-file [second-file third-file ...]

   It is true that my proposed behaviour in the case of
      md5sum < filename
   is different.  It produces eg
      d41d8cd98f00b204e9800998ecf8427e
   instead of
      d41d8cd98f00b204e9800998ecf8427e  -

   But the former is very nearly strictly superior.  The latter output
   is pretty much useless as input to `md5sum -c' precisely because it
   also doesn't include the filename.

   It is possible that a small proportion of the programs which were
   changed to accept the new output by stripping the `  -' were
   changed so that they would no longer accept the old unvarnished
   form - but such programs will be rare because that makes them
   incompatible with the old behaviour of dpkg's md5sum.

 * Opponents of my suggestion have also claimed that it is not
   appropriate for the Debian maintainer to make this change and that
   instead I should get upstream coreutils to make the change first.

   I agree that it would be good for coreutils to change.  But, as a
   Debian maintainer I have to write programs which are compatible
   with the md5sum shipped in Debian.  If Debian's md5sum's behaviour
   is not restored then I will have to change my packages so that they
   cope with the new broken behaviour.  This is quite different to the
   upstream coreutils, where I can blow off bug reports saying `the
   GNU people broke it - get them to fix it again'.

   Debian has never shied away from making technically correct changes
   even if the face of opposition from upstream, and we should not do
   so now - when there is little evidence of any opposition from
   upstream.  I would of course encourage the Debian coreutils
   maintainer to talk to the GNU maintainer to try to rationalise the
   situation, but in the Debian project it's primarily the Debian
   package maintainer's responsibility to do any necessary
   communication with upstream.  I can't reasonably demand that a
   volunteer Debian maintainer actually do that work, but I don't
   think that their lack of time to do so is a good excuse for not
   fixing the bug in Debian.

   The remaining argument for waiting for the reversion to be accepted
   upstream is that we should be `compatible' with GNU coreutils so
   that other 3rd-party programs will work well on Debian.  However,
   the change does not introduce any significant incompatibility:
   programs outside Debian which feed stdin to md5sum already have to
   work with the GNU version, old PGP versions, etc., some of which
   include the spurious `  -' and some of which don't.  Programs which
   _fail_ when they cannot strip the spurious `  -' because it's
   missing will be very rare and easy to fix.

 * There has been some suggestion that there is a need to trawl
   through packages looking for ones which will break.

   As discussed above, the compatibility problems are nearly
   nonexistent.  Very probably nothing will break.  There is a small
   chance that there is some program was unwisely modified to insist
   on stripping `  -' and which now fails if it's not present.  Any
   such program should be fixed anyway to enhance its compatibility
   with non-GNU versions of md5sum including those from older Debian
   versions.

 * Historical context:

   Debian has used an md5sum in the dpkg package.  This md5sum came
   originally from PGP2.x (circa 1992/1993), and was originally
   written by Colin Plumb.  It produced the bare checksum when the
   filename wasn't supplied.  (It also provided `md5sum -c' and
   produced the corresponding output format for when the filenames
   were supplied.)

   Some time in the last few years, GNU textutils gained a version of
   md5sum.  This md5sum has slightly different behaviours - it
   interprets unexpected input slightly differently for md5sum -c, and
   it also produces the annotated output in the case at issue.

   As I recall (but I could be wrong) the dpkg md5sum was, when
   textutils gained its own md5sum, briefly retired in favour of the
   textutils one.  However, the dpkg one was quickly restored, mainly
   because of the behavioural differences, including the annotation
   when taking input from stdin.

   AIUI, most recently, a version of dpkg was been uploaded whose
   md5sum has been modified to produce the annotated output.

Thanks,
Ian.



Reply to: