[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: limits for package name and version (MBF alert: ... .deb filenames)



Henrique de Moraes Holschuh <hmh <at> debian.org> writes:
> On Tue, 26 Apr 2011, Adam Borowski wrote:
> > Telling someone "the bug is in a version I pulled from the VCS but didn't
> > bother noting down which version it was" is not very useful.
> 
> Now you're being silly.
> 
> All you need is the proper date and time to use as a version (for
> ordering), and a proper debian/changelog entry:
> 
>   * New upstream source (git://blah, commit "blah: did something",
>     [#12030a47ebafdcd]).
> 
> I.e. the best current practice.  How surprising.  Now you either tell

Using date and time as a version is not current best practice. You'll still need
the upstream version part too to sort correctly relative to released versions.
So you'll have the latest upstream version tag, followed by a long timestamp.
That's no shorter than typical 'git describe' output, just a lot less functional.

> upstream when you checked out his tree (and he can locate the commit by
> date/time), or look the hash in debian/changelog and tell him that, and
> he might be able to locate the commit even on rebased trees.

It's naive to think that identifying revisions by timestamp would be that simple
when dealing with distributed revision control systems. There are at least 3
different timestamp types that are relevant to git:

1) The closest analogue to traditional "svn2005...." versions would be the time
when the commit appeared in a certain branch on a canonical central server.
However, information about when the commits were pushed to a particular server
is not stored in the repository history. Thus this form cannot be used for DVCS.

2) The author timestamps stored in commits are the timestamps most prominently
displayed by various tools. However they're not ordered as those timestamps can
come from commits cherry-picked in arbitrary order.

3) The stored committer timestamps are the most realistic alternative, but still
not a good one. If there is only a single relevant "master" branch from which
packaged versions will be taken, history of that branch is never modified, and
everyone creating commits sets the timestamps accurately, then in that case they
will be ordered. But they don't match the time when the changes were publicly
visible, and even less when they were available in that particular master
branch. Also one-second accuracy in version timestamps would not be enough to
identify a particular revision. Cherry-picking, rebasing or applying a series of
mailed patches can create a long run of closely spaced committer timestamps very
rapidly (and rebasing here can be before the changes are first made public, so
would occur in projects that do not modify public history).

Your above "tell upstream when you checked out his tree and he can locate the
commit by date/time" would only work properly for timestamps of type 1). But
that's not an at all realistic alternative.

What you wrote about identifying branches in your other mail ("and you already
know which branch of which tree because that information must be available and
up-to-date in debian/copyright") is also wrong or at least meaningless. Maybe
you'll know that the code was available on the project's public repository under
the branchname "fixes-for-debian" at the time it was downloaded. But what good
will that information do for you later, if the contents of that branch were
merged to another and the obsolete branch name then deleted two days after being
created? In the typical case branch names are not persistent information.
 

> Also, it IS worth reminding all that git-describe output has *NO* forward
> uniqueness guarantees: for that it would have to always use the full hash,
> and even that could break should we be forced to drop sha1 for something
> longer.

That's wrong in several ways. Even if a hash prefix of the length included in
'git describe' output stopped being unique at some point in the future there
isn't much risk of real confusion (or would you really be unable to tell whether
it's the version from around now or the one from 3 years in the future that's
meant?). And the overall version would still be unique if the tag part changed
before then. "Have to always use the full hash" is wrong; there's no magic
property which makes the hash unique at exactly the full length, that's just the
maximum possible you can take (shorter prefixes will be unique in practice). And
moving from sha1 to something else would not have to invalidate the uniqueness
properties of existing sha1 hashes.


> > No matter how many commits were done at a particular date, and whether the
> > commit was cherry-picked, rebased or tossed around in other ways, a hash
> > will let you tell which exactly version it is.
> 
> At one point in time, which is not relevant anymore (a rebase happened), and
> the object the hash used to point to might have been lost (garbage
> collection).
> 
> You're either supposed to not rebase anything that is ever made public, or
> to identify a commit by its title when it is unique enough, otherwise by
> (author, date, title).

If the history was modified and the revision Debian used does not exist in
upstream history any more that's certainly important information in itself. A
timestamp might fool you into thinking that the Debian version corresponds to a
particular commit in the upstream repository - a potentially very dangerous
misunderstanding. A hash will prevent such confusion.


Reply to: