[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Tracking versions

On Thu, Dec 12, 2002 at 03:23:21PM -0500, Matt Zimmerman wrote:
> On Thu, Dec 12, 2002 at 06:11:45PM +0000, Colin Watson wrote:
> > On several occasions aj has suggested changelog parsing for this,
> > building a tree of known versions and their inheritance for every
> > package. I don't quite see all the details of how this will work yet,
> > but it does seem like the most obviously correct approach. Modifying
> > dpkg-parsechangelog and/or apt-ftparchive will probably be necessary.
> > 
> > See the archives of debian-debbugs for discussion.
> I saw that now, in the IRC log that someone posted a URL for.  I can't
> say that I like the idea much, because it requires a lot of data, and
> that data can't be extracted from (e.g.) the .changes file.  It's also
> inconvenient to work with, because in order to compare two versions
> properly, they must be traced back to a common ancestor.  It's not
> even trivial to find the right changelog data if you're working with
> binary packages (as in debbugs), since that requires maintaining
> historical information about source->binary mappings.  The recent
> changelog entries for foo could be in bar now, but it used to be built
> from a source package named foo, and the old changelog entries are
> there.

All true. I think any possible solution to calculating version ancestry
is going to be inaccurate in some cases, given the wide variety of weird
special cases that crop up in Debian packages. It may be OK that we have
problems with people renaming packages; after all, that isn't handled
automatically at the moment.

The "common ancestor" test is probably not too bad. I haven't worked it
out in detail, but it doesn't feel like there will be a major
performance problem.

> > You don't have to keep track of the distribution at which every
> > upload was targetted; you "only" need to get a path through the tree
> > from each changelog you encounter, build the tree as you go from
> > that (coping with inconsistencies somehow), and know what versions
> > are currently in each distribution. Certainly not at all trivial,
> > but doable, and I don't think storing the version trees will take a
> > significant amount of space compared to the size of the bug
> > database.
> Right, and in the course of normal processing of bugs, "each changelog you
> encounter" is no changelogs at all. :-)

Since aj was the one proposing it, I assumed that extra features could
be added to katie to help. We certainly need to be told about new
uploads at least (they might not all make it into the Packages file if
there are multiple uploads per day). I do concede that it's not a
trivial operation.

> The amount of data is not outrageous, but it is not straightforward to
> obtain or to operate on, which is why I would prefer a different
> approach. The simple and intuitive way to keep track of branches is to
> use version numbers.  Since our branches never merge per se, it would
> be enough to be able to compare two version numbers and say "same
> branch" or "different branches".

So, if we simply assumed that the trunk of development happens along
increasing version numbers, and that versions which are prefixed by
another existing version number plus some punctuation indicate the root
of a branch if that version compares less than any we've seen so far,
what situations would break?

I can think of one right off: imagine an upload targeted only at sarge
during the freeze, with a later upload containing a perhaps better but
less safe fix targeted at unstable. So you'll have:

  foo (1.0-1) ---- foo (1.0-1sarge1)
                \- foo (1.0-2)

If 1.0-1sarge1 is uploaded before 1.0-2, I can't see how you would work
out that it was a branch just by comparing version numbers. Just saying
"there are letters in there, it must be a branch" doesn't work: compare
postgresql. Recognizing "woody", "sarge", etc. feels wrong (userv is at
version "" in unstable, for example, even if it is
misspelled ...). You might be able to do it by using Distribution: as
well, perhaps, although I can imagine cans of worms there too.

Don't get me wrong, I'd love it if simple version comparison could do
the job, because correct changelog parsing is non-trivial. Maybe it can
and I just haven't seen how yet.

The other thing I'm interested in is what to do about invalid found-in
versions: people often report bugs with "Version: unavailable; reported
yyyy/mm/dd" or "Version: something-I-haven't-uploaded-yet" or whatever.
Pseudo-packages usually don't have sensible version numbers either
(although people have been known to say "Version: whatever's currently
on master" or something like that anyway). If the version is unparseable
I think we'd be best to just treat it as if no Version: header had been
supplied and assume it applies to all versions we currently care about;
if it's parseable but unknown just store it for future reference and do
something hopefully sensible. :)

The algorithm for "do we display this bug when requesting bugs in
version x" would be something like this:

  any found-in?
      is a found-in version an ancestor of x?
          any fixed-in?
              is a fixed-in version an ancestor of x?
                  don't display bug
                  display bug
              display bug
          are any of the found-in versions recognized?
              don't display bug
              display bug
      display bug

Or something along those lines, anyway.


Colin Watson                                  [cjwatson@flatline.org.uk]

Reply to: