[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: policy regarding redistributable binary files in upstream tarballs



On Thu, 2014-11-20 at 13:46 +0800, Paul Wise wrote:
> On Thu, Nov 20, 2014 at 1:14 PM, Ben Finney wrote:
> 
> > But a growing number of upstreams disagree, so those upstreams are
> > likely to be actively opposed to your recommendation to patches 
> > which remove non-source files from the VCS repository.
> 
> I wonder about the basis for that disagreement.

In the GNU model the tarball doesn't just provide sources.  It is a
complete packaging system.  It checks for build prerequisites, does the
build, checks for install prerequisites, does the install, and does
uninstall.  It does all that because upstream wants people to use their
project - regardless of whether their distro packages it.  Combine that
with wanting to keep the perquisite list small including things like
minified jquery libraries is exactly the right thing to do.

> Putting all third-party libraries into a separate place (tarball,
> repo, branch or dir).
> 
> Putting all pre-built files into a separate place (tarball, repo,
> branch or dir).

Those suggestions may make things easier for Debian, but they do so by
making life harder for upstream's other users.  That isn't going to
happen, or at least for me it wouldn't.  If my DD alter ego asked my
upstream ego to make things much harder for his other users, he would be
politely told where he could shove his suggestion.

Personally, I think Debian passing judgement on what is upstream
pristine tarballs is over the top.  It's upstream's original work, not
Debian's.  Ideally we are just mirroring it.  (That we often aren't is
part of the problem.)  We should be happy enough to accept their
assurances on having obtained whatever licenses they need for what is in
them. [0]

Admittedly this meshes well with my experience that they are often
fairly lax about what they put in those tarballs.  Their "make
distclean" scripts are often not as good as they could be, which means
all sorts of crap it left lying around. Vim .swp files and compiler
intermediates spring to mind.  I have no idea what license would apply
to a .swp file, but I do know that for all practical purposes it doesn't
matter and I'd rather Debian didn't insist I find out. [1]

That's just me being lazy I guess.  But there is a deeper issue.  For me
it is vital there be an audit trail from the pristine upstream tar ball
to the binaries we distribute. [2]  In pursuing licensing purity we have
been gradually destroying what little of that audit trail we used to
provide.  To put it bluntly: as a DD I do care about licensing, but when
it comes to day job where I have to ensure hundreds of computers are
reliable and secure so the licensing of of tarballs I don't download let
alone use takes a distant second place to security.  So in my view we
are making life difficult for our users on the altar of FSF style
idealism.

Maybe if we were forced to choose between the two that would be right
choice to make.  But technically there are a ways to be FSF idealists
and provide something akin to an audit trail.  So we aren't forced to
choose - but we just deprive our users of the audit trail anyway.  That
is bad.

What follows is something I am sure has been covered before by someone
somewhere, before I started following the project in earnest.  I can't
find it - so I apologise in advance for the repetition.

I start my Linux life as a RedHat user, and I wrote RPM packages for my
own use.  Then about a decade ago I moved to Debian, and of course
started writing Debian packages.  During the transition I was struck by
how much better Debian's binary packaging was compared to RPM, and yet
RPM's source packaging was so much better than Debian's.

To explain why I'll step back a bit.  If I were writing a book on how to
design a packaging system it would start by introducing these 5 steps:

A.  The process is ideally [3] a pure function.  It's input is the
    pristine source.  It's output is the binary packages.  So the 1st
    step is to obtain the input - the pristine source, and record it
    in the output so anyone else can reproduce what you have done.

B.  These inputs are fed to packaging process a program, written by the
    packager, that implements the function doing the transformation.
    In debian, this is debian/rules.  This function is split into
    standardised steps.  The second step is unpacking sources from
    whatever format they are in into a build directory. [4]

C.  The third step is to tailor the pristine sources to match the
    requirements of the distribution.  This is done in a standardised
    way: by applying a series of patches in a well defined format, each
    with a clearly documented purpose.

D.  Run the build process as supplied by upstream, but perhaps modified
    by step (C).

E.  Collecting the output of the build process into binary packages. [5]

And that is exactly what RPM's did over a decade ago.  Debian mashed
steps (A), (B) and (C) into what could only be described as a mess.

Time has moved on, and things have changed.  Given the existence of yum,
I guess rpm's binary format has improved.  Oddly RedHat dropped step (C)
for their kernels, so arguably their source packaging format has gone
backwards.  Debian source format has improved, with step (C) above now
being a standard part of it.  (Whooho!)  Arguably we are also making
tiny steps towards (A) with the introduction of uscan's ability to
repackage upstream sources in a well documented way. But that is not
enough for an audit trail.

One way of getting much closer would be:

[a]  Add two new mandatory targets in debian/rules that:

       1.  Obtains the upstream sources into some temporary area,
           computes their hash(s), and saved the URL, downloaded file
           name, and hash in a standard place in debian/...

       2.  Build the .orig.tar from the downloaded file(s).

[b]  Add a section to the .dsc (say Pristine-Sha256) that contains the
     URL's and hashes from step [a.1].

We already have numerous tools for doing [a], notably the
get-orig-source target, uscan and mk-origtargz so there not much in the
way of new code is required.

Maybe we could consider doing something like this for stretch?  I'd be
happy to do the work if there was some hope of it being accepted.



[0]  Just to be clear about the distinction: the licensing obviously has
     crystal clear for anything in the tarball Debian uses to produce
     the .deb's it distributes.  I'm saying we should be happy to accept
     upstream's assurances for what we don't use, unless we have
     evidence to the contrary such as a written complaint.  Doing this
     might be dubious if we repackage tarballs, because then we are not
     just mirroring what upstream gave us permission to mirror.

[1]  And just to be clear again, I'm not suggesting we do [0].  I'm just
     whining.  But whining with a purpose - as partial justification for
     what follows.  The principle of not caring about anything that
     isn't used for packaging is fine.  The practicality is it is hard
     to verify the packager doesn't use them if the problematic sources
     are left in the build tree.  Insisting they aren't in the build
     tree before the build starts is the obvious solution.  For many
     packaging systems that would not be an issue, but Debian's source
     package format makes it near impossible.

[2]  Without a simple (read mechanically checkable) audit trail from
     upstream to Debian's binaries we are leaving ourselves open to
     rouge DD's weakening security in a way that is almost undetectable.
     They can just modify the upstream sources when they repackage the
     upstream pristine tarball.

[3]  Ideally if the packaging transformation should be a pure function,
     so for given the set inputs (ie upstream files with the correct
     hash) it always would produce the identical outputs.  Sadly it
     doesn't, and making it do so isn't a solved problem.  I gather
     there has been some discussion about this under the heading
     "reproducible binaries".  The major reason we can't contemplate
     this today is we don't capture the environment used to build the
     package - the time, the versions of the compiler and libraries used
     and so on.  Capturing this information would be a big step forward,
     and it would not be hard to do.  From what I can see all it appears
     to need is another stanza in the .changes file, recording the
     debian packages in the build-depends and essential, and the time.
     And of course storing the .changes file on the mirrors.  This still
     wouldn't give us a reproducible binary of course, but it might be
     close enough in practice.  Maybe enough provide some incentive to
     others to see what else is needed to mechanically verify a build
     under identical conditions produces packages that are for practical
     purposes identical.

[4]  Unpacking to a separate build directory neatly sidesteps several
     Debian nits.  Firstly, you don't care if upstream has a debian
     directory.  Secondly, there is no need for "debian/rules clean".
     That might again seem like me being lazy, and maybe it is, but
     sheesh, at least I ensure it works in everything I do.  Running
     debuild in the same source directory has failed for me a
     surprisingly often, most recently with #765073.  Running debuild,
     interrupting it, then running it again after a clean fails more
     times.  Not surprisingly because it is near impossible to test
     interrupting the build process at every possible point.  Sometimes
     it seems we go out of our way to make things harder for ourselves.

[5]  In reality it often gets more complex than these simple 5 steps.
     In particular it is not unusual to provide several variants of the
     package, with different patches applied.  Exim4 does this.  So real
     life is more complex than my idealised world, but it doesn't alter
     the underlying principles.

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: