[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: limits for package name and version (MBF alert: ... .deb filenames)



Henrique de Moraes Holschuh <hmh <at> debian.org> writes:
> On Tue, 26 Apr 2011, Uoti Urpala wrote:
> > Using date and time as a version is not current best practice. You'll still
> > need the upstream version part too to sort correctly relative to released
> > versions.
> 
> I was refering to the full commit description in the debian/changelog
> entry.

You were talking about selecting version strings AND adding a changelog entry.
I very clearly addressed the version part.

> The best current practice for date/time-based versioning is described in
> official Debian documentation already, both it and shorter variants (such
> as using unix time) have already been mentioned in this thread.

This branch of the thread was NOT about packages that use date ONLY. Maybe
that's what you were confused about above? The version would still need the
last release name too, as in 15.3.2~rc3+svn20050101120000.

> 
> > So you'll have the latest upstream version tag, followed by a long
> > timestamp. That's no shorter than typical 'git describe' output, just a
> > lot less functional.
> 
> It is *bounded*, and it can be a LOT shorter.

Typically it is not a "LOT" shorter. And as I explained in the part you
snipped, a timestamp with one-second precision may not be enough to
adequately identify a version in some not-particularly-rare use cases.


> > Your above "tell upstream when you checked out his tree and he can locate
> > the commit by date/time" would only work properly for timestamps of type
> > 1). But that's not an at all realistic alternative.
> 
> You have the full commit info in the changelog, where you can specify
> branch, etc. when best practice is being followed.  Use it.

If you have recorded the exact hash that will work (of course!). But what you
were saying about timestamps would not work.


> And you can certainly tell your upstream what sort of date you used, if
> you're not smart enough to use the commit date of the top commit, which is
> the "date that branch was last modified".

It's certainly NOT "the date that branch was last modified". If you have
that misconception it perhaps explains why you have problems understanding
some of the issues with timestamps (though I think the explanations in my
previous mail should already have addressed that). The commit date is when
the particular tree/history state at that commit was created. It is NOT
directly associated with any particular branch. It's normal for things to
be first created in local repos, then possibly pushed to a public
development branch, then later after some testing to the master branch;
when these are fast forward merges the timestamp doesn't change even though
each branch changed at a quite different time. If the top commit of the
master branch has a commit timestamp from a month ago that means the branch
could have been modified a minute ago.

 
> > What you wrote about identifying branches in your other mail ("and you
> > already know which branch of which tree because that information must be
> > available and up-to-date in debian/copyright") is also wrong or at least
> > meaningless. Maybe you'll know that the code was available on the project's
> > public repository under the branchname "fixes-for-debian" at the time it
> > was downloaded. But what good will that information do for you later, if
> > the contents of that branch were merged to another and the obsolete branch
> > name then deleted two days after being created? In the typical case branch
> > names are not persistent information.
> 
> It is at least as future-proof as hashes.  If your upstream is messy and
> likes to rebase and lose past history, only the full commit info

You're mixing up completely different things. Nothing in my example involved
rebasing or losing history. That's the point: branch names are not a part of
stored history, and can disappear/change even if there is no "messiness".

 
> The length of the shortened hash used in git-describe is verified for
> uniqueness only when generated (if at all).  Unless specifically

I already addressed exactly this in the mail you're replying to.

> Full hash colisions are impossible, because, well, the basic constraints
> the VCS depends upon *BREAKS* when that happen.  That commit never gets
> accepted into the repository because the VCS aborts/abends.  You try
> again, get a different commit date/time and thus a different hash, the
> colision condition is gone if you're lucky enough not to get a new one,
> and life continues.  I really should not have to explain *THIS*.

You really should not try to explain something you clearly have no clue about.
You get a 160-bit hash match, say "damn, bad luck there", change things a bit
and move on? I hope some readers can at least appreciate your explanation for
the comedy value :)


> 0. This is about package versioning;
> 1. You do not have space for the full hash in the version string;
> 2. such hash alone is useless for the packaging system in the first place,
>    it does not work as a package version by itself at all;
> 3. the shortened hash is of limited value for "upstream identification"
>    purposes when things get difficult, and wastes precious space;

It's of high value for "upstream identification" purposes when things are
NOT difficult. And it's also of high value in the difficult cases as it'll
normally make it obvious that there ARE difficulties such as changed
upstream history; with only a timestamp you could easily make a dangerous
mistake without realizing there's anything special to watch out for.

> 4. you're supposed to put lots of meta information about the top commit in
>    the changelog to actually have something that is guaranteed to work well
>    for "upstream identification" purposes.  That includes the full hash and
>    more;
> 5. using unbounded methods of identifying the upstream release is never
>    going to be a best practice because you have to manually check it every
>    time to not have exceeded the maximum length and when it does, you
>    will have to fudge it and break the pattern anyway.

There's no bounded method that's guaranteed to adequately identify the
upstream revision. If you want to restrict length to a particular limit,
checking that would be easy to automate (you would not need to "manually
check it every time"). On the other hand, checking whether a timestamp
meaningfully identifies a revision is much harder.

 
> What's so difficult to grasp, here?

I think the main difficulty is that you lack understanding and/or experience
about the practical issues and use cases that can come up in DVCS development.
You clearly lack the needed mathematical understanding to assess hash
uniqueness properties too. Hopefully the people who end up setting Debian
practices will be better informed.



Reply to: