[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

git interface to snapshot.debian.org



Hi,

this is a follow-up to my question after the dgit talk today: It would
be great to have a git view of the a package’s history in Debian. There
is some possible overlap with dgit in the sense that if everyone had
been using dgit from the start, then we would have that, but dgit’s
objectives are slightly different, so maybe my question could be posed
and answered separately.

There is precedent to what I want: http://hdiff.luite.com/ is a service
that imports every Haskell package upload into a git repository, and
provides a cgit interface to it. This has been very useful to me as a
tool to investigate what has happened when, and to easily view diffs.

Now snapshot.debian.org already contains all the data that should go
into these git repositories. What would stop us from importing all of
the sources packages into per-package git repositories?
Given that it’s only source and there is compression, I would expect
the resource usage to be acceptable.

If the answer is „Nothing is stopping, just that someone has to do it“,
then I’m volunteering, as long as I can do most of it during DebConf.
Peter, what do you think? I probably do not need more than access to
snapshot.debian.org and a directory there to work on.


Technically, this is how I would do it:
I phrase it terms of the git data model, and not in terms of the git
command that reach that, as that gives a cleaner specification.

 * Every source package from snapshots.d.o becomes, extracted with 
   dpkg-source -x as usual, produces a git tree object.
   I’d probably simply ignore empty directories.
 * Every source package also produces a git commit, with
   - Tree: the above
   - Author: top changelog entry
   - Date: also top changelog entry
   - Description summary: The version number
   - Description text: The top changelog entry.
   - Parents: This is the interesting bit
     The set of parents should be the commits corresponding to any 
     version mentioned in debian/changelog, pruned by those that
     are transitively reachable.
  
     This ensures that we get a nice git DAG for things like packages 
     that have been experimental for a while, merging from unstable
     repeatedly.

     The order of parents could correspond to the order in 
     debian/changelog, so that the second changelog entry becomes
     the first parent.

   These rules should, unless suddenly new historic packages appear, 
   ensure that we get identical git hashes if we re-run this tool, 
   which is goo.
 * Every suite (unstable, jessie...) becomes a branch, pointing to the
   corresponding commit
 * Optionally: One tag per version pointing to the corresponding
   commit, for each version. Although maybe that would produce too
   many tags...


Greetings,
Joachim


-- 
Joachim "nomeata" Breitner
Debian Developer
  nomeata@debian.org | ICQ# 74513189 | GPG-Keyid: F0FBF51F
  JID: nomeata@joachim-breitner.de | http://people.debian.org/~nomeata

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: