[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: limits for package name and version (MBF alert: ... .deb filenames)



On Tue, 26 Apr 2011, Uoti Urpala wrote:
> Henrique de Moraes Holschuh <hmh <at> debian.org> writes:
> > On Tue, 26 Apr 2011, Adam Borowski wrote:
> > > Telling someone "the bug is in a version I pulled from the VCS but didn't
> > > bother noting down which version it was" is not very useful.
> > 
> > Now you're being silly.
> > 
> > All you need is the proper date and time to use as a version (for
> > ordering), and a proper debian/changelog entry:
> > 
> >   * New upstream source (git://blah, commit "blah: did something",
> >     [#12030a47ebafdcd]).
> > 
> > I.e. the best current practice.  How surprising.  Now you either tell
> 
> Using date and time as a version is not current best practice. You'll still need
> the upstream version part too to sort correctly relative to released versions.

I was refering to the full commit description in the debian/changelog
entry.

The best current practice for date/time-based versioning is described in
official Debian documentation already, both it and shorter variants (such
as using unix time) have already been mentioned in this thread.

> So you'll have the latest upstream version tag, followed by a long timestamp.
> That's no shorter than typical 'git describe' output, just a lot less functional.

It is *bounded*, and it can be a LOT shorter.

> It's naive to think that identifying revisions by timestamp would be that simple
> when dealing with distributed revision control systems. There are at least 3
> different timestamp types that are relevant to git:

[...]

> Your above "tell upstream when you checked out his tree and he can locate the
> commit by date/time" would only work properly for timestamps of type 1). But
> that's not an at all realistic alternative.

You have the full commit info in the changelog, where you can specify
branch, etc. when best practice is being followed.  Use it.

And you can certainly tell your upstream what sort of date you used, if
you're not smart enough to use the commit date of the top commit, which is
the "date that branch was last modified".   If that is useless to
upstream, either you know and you tell them the full info from the
debian/changelog right at first, or they complain and you reply with the
full info from debian/changelog.

> What you wrote about identifying branches in your other mail ("and you already
> know which branch of which tree because that information must be available and
> up-to-date in debian/copyright") is also wrong or at least meaningless. Maybe
> you'll know that the code was available on the project's public repository under
> the branchname "fixes-for-debian" at the time it was downloaded. But what good
> will that information do for you later, if the contents of that branch were
> merged to another and the obsolete branch name then deleted two days after being
> created? In the typical case branch names are not persistent information.

It is at least as future-proof as hashes.  If your upstream is messy and
likes to rebase and lose past history, only the full commit info
(hash+author+title, etc) you are supposed to have added to the
debian/changelog might help locate it if that commit still exists in a
meaningfull way.

> > Also, it IS worth reminding all that git-describe output has *NO* forward
> > uniqueness guarantees: for that it would have to always use the full hash,
> > and even that could break should we be forced to drop sha1 for something
> > longer.
> 
> That's wrong in several ways. Even if a hash prefix of the length included in
> 'git describe' output stopped being unique at some point in the future there
> isn't much risk of real confusion (or would you really be unable to tell whether
> it's the version from around now or the one from 3 years in the future that's
> meant?). And the overall version would still be unique if the tag part changed
> before then. "Have to always use the full hash" is wrong; there's no magic
> property which makes the hash unique at exactly the full length, that's just the
> maximum possible you can take (shorter prefixes will be unique in practice). And
> moving from sha1 to something else would not have to invalidate the uniqueness
> properties of existing sha1 hashes.

The length of the shortened hash used in git-describe is verified for
uniqueness only when generated (if at all).  Unless specifically
configured, it will be short enough that colisions *on the shortened hash*
have been observed in practice to be likely on very large projects.  That
is why git has already been modified so that you can configure it to, e.g.
never use less than 15 digits of the hash.  The rest of the git-describe
output has no inerent strong guarantees of uniqueness, that's why the hash
is added in the first place.  It could well be unique if your upstream
never rewinds/rebases, uses only one public branch, and it could be
bounded if she tags often and always has simple history.   Or it could
not.

If you have a proper base-version from a tag, you will likely want to use
it, something else for ordering, and then you are out of space for a long
hash anyway.

Full hash colisions are impossible, because, well, the basic constraints
the VCS depends upon *BREAKS* when that happen.  That commit never gets
accepted into the repository because the VCS aborts/abends.  You try
again, get a different commit date/time and thus a different hash, the
colision condition is gone if you're lucky enough not to get a new one,
and life continues.  I really should not have to explain *THIS*.

And this is not about uniquely identifying upstream releases, and it has
NEVER BEEN.  That's what debian/changelog and debian/copyright is for.
It is about something good enough for the *package version string*, which
has MUCH stricter requirements, including a fairly draconian length, and
an absolute ordering requirement.

> > > No matter how many commits were done at a particular date, and whether the
> > > commit was cherry-picked, rebased or tossed around in other ways, a hash
> > > will let you tell which exactly version it is.
> > 
> > At one point in time, which is not relevant anymore (a rebase happened), and
> > the object the hash used to point to might have been lost (garbage
> > collection).
> > 
> > You're either supposed to not rebase anything that is ever made public, or
> > to identify a commit by its title when it is unique enough, otherwise by
> > (author, date, title).
> 
> If the history was modified and the revision Debian used does not exist in
> upstream history any more that's certainly important information in itself. A
> timestamp might fool you into thinking that the Debian version corresponds to a
> particular commit in the upstream repository - a potentially very dangerous
> misunderstanding. A hash will prevent such confusion.

0. This is about package versioning;
1. You do not have space for the full hash in the version string;
2. such hash alone is useless for the packaging system in the first place,
   it does not work as a package version by itself at all;
3. the shortened hash is of limited value for "upstream identification"
   purposes when things get difficult, and wastes precious space;
4. you're supposed to put lots of meta information about the top commit in
   the changelog to actually have something that is guaranteed to work well
   for "upstream identification" purposes.  That includes the full hash and
   more;
5. using unbounded methods of identifying the upstream release is never
   going to be a best practice because you have to manually check it every
   time to not have exceeded the maximum length and when it does, you
   will have to fudge it and break the pattern anyway.

What's so difficult to grasp, here?

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh


Reply to: