On Thu, 2014-11-20 at 13:46 +0800, Paul Wise wrote: > On Thu, Nov 20, 2014 at 1:14 PM, Ben Finney wrote: > > > But a growing number of upstreams disagree, so those upstreams are > > likely to be actively opposed to your recommendation to patches > > which remove non-source files from the VCS repository. > > I wonder about the basis for that disagreement. In the GNU model the tarball doesn't just provide sources. It is a complete packaging system. It checks for build prerequisites, does the build, checks for install prerequisites, does the install, and does uninstall. It does all that because upstream wants people to use their project - regardless of whether their distro packages it. Combine that with wanting to keep the perquisite list small including things like minified jquery libraries is exactly the right thing to do. > Putting all third-party libraries into a separate place (tarball, > repo, branch or dir). > > Putting all pre-built files into a separate place (tarball, repo, > branch or dir). Those suggestions may make things easier for Debian, but they do so by making life harder for upstream's other users. That isn't going to happen, or at least for me it wouldn't. If my DD alter ego asked my upstream ego to make things much harder for his other users, he would be politely told where he could shove his suggestion. Personally, I think Debian passing judgement on what is upstream pristine tarballs is over the top. It's upstream's original work, not Debian's. Ideally we are just mirroring it. (That we often aren't is part of the problem.) We should be happy enough to accept their assurances on having obtained whatever licenses they need for what is in them. [0] Admittedly this meshes well with my experience that they are often fairly lax about what they put in those tarballs. Their "make distclean" scripts are often not as good as they could be, which means all sorts of crap it left lying around. Vim .swp files and compiler intermediates spring to mind. I have no idea what license would apply to a .swp file, but I do know that for all practical purposes it doesn't matter and I'd rather Debian didn't insist I find out. [1] That's just me being lazy I guess. But there is a deeper issue. For me it is vital there be an audit trail from the pristine upstream tar ball to the binaries we distribute. [2] In pursuing licensing purity we have been gradually destroying what little of that audit trail we used to provide. To put it bluntly: as a DD I do care about licensing, but when it comes to day job where I have to ensure hundreds of computers are reliable and secure so the licensing of of tarballs I don't download let alone use takes a distant second place to security. So in my view we are making life difficult for our users on the altar of FSF style idealism. Maybe if we were forced to choose between the two that would be right choice to make. But technically there are a ways to be FSF idealists and provide something akin to an audit trail. So we aren't forced to choose - but we just deprive our users of the audit trail anyway. That is bad. What follows is something I am sure has been covered before by someone somewhere, before I started following the project in earnest. I can't find it - so I apologise in advance for the repetition. I start my Linux life as a RedHat user, and I wrote RPM packages for my own use. Then about a decade ago I moved to Debian, and of course started writing Debian packages. During the transition I was struck by how much better Debian's binary packaging was compared to RPM, and yet RPM's source packaging was so much better than Debian's. To explain why I'll step back a bit. If I were writing a book on how to design a packaging system it would start by introducing these 5 steps: A. The process is ideally [3] a pure function. It's input is the pristine source. It's output is the binary packages. So the 1st step is to obtain the input - the pristine source, and record it in the output so anyone else can reproduce what you have done. B. These inputs are fed to packaging process a program, written by the packager, that implements the function doing the transformation. In debian, this is debian/rules. This function is split into standardised steps. The second step is unpacking sources from whatever format they are in into a build directory. [4] C. The third step is to tailor the pristine sources to match the requirements of the distribution. This is done in a standardised way: by applying a series of patches in a well defined format, each with a clearly documented purpose. D. Run the build process as supplied by upstream, but perhaps modified by step (C). E. Collecting the output of the build process into binary packages. [5] And that is exactly what RPM's did over a decade ago. Debian mashed steps (A), (B) and (C) into what could only be described as a mess. Time has moved on, and things have changed. Given the existence of yum, I guess rpm's binary format has improved. Oddly RedHat dropped step (C) for their kernels, so arguably their source packaging format has gone backwards. Debian source format has improved, with step (C) above now being a standard part of it. (Whooho!) Arguably we are also making tiny steps towards (A) with the introduction of uscan's ability to repackage upstream sources in a well documented way. But that is not enough for an audit trail. One way of getting much closer would be: [a] Add two new mandatory targets in debian/rules that: 1. Obtains the upstream sources into some temporary area, computes their hash(s), and saved the URL, downloaded file name, and hash in a standard place in debian/... 2. Build the .orig.tar from the downloaded file(s). [b] Add a section to the .dsc (say Pristine-Sha256) that contains the URL's and hashes from step [a.1]. We already have numerous tools for doing [a], notably the get-orig-source target, uscan and mk-origtargz so there not much in the way of new code is required. Maybe we could consider doing something like this for stretch? I'd be happy to do the work if there was some hope of it being accepted. [0] Just to be clear about the distinction: the licensing obviously has crystal clear for anything in the tarball Debian uses to produce the .deb's it distributes. I'm saying we should be happy to accept upstream's assurances for what we don't use, unless we have evidence to the contrary such as a written complaint. Doing this might be dubious if we repackage tarballs, because then we are not just mirroring what upstream gave us permission to mirror. [1] And just to be clear again, I'm not suggesting we do [0]. I'm just whining. But whining with a purpose - as partial justification for what follows. The principle of not caring about anything that isn't used for packaging is fine. The practicality is it is hard to verify the packager doesn't use them if the problematic sources are left in the build tree. Insisting they aren't in the build tree before the build starts is the obvious solution. For many packaging systems that would not be an issue, but Debian's source package format makes it near impossible. [2] Without a simple (read mechanically checkable) audit trail from upstream to Debian's binaries we are leaving ourselves open to rouge DD's weakening security in a way that is almost undetectable. They can just modify the upstream sources when they repackage the upstream pristine tarball. [3] Ideally if the packaging transformation should be a pure function, so for given the set inputs (ie upstream files with the correct hash) it always would produce the identical outputs. Sadly it doesn't, and making it do so isn't a solved problem. I gather there has been some discussion about this under the heading "reproducible binaries". The major reason we can't contemplate this today is we don't capture the environment used to build the package - the time, the versions of the compiler and libraries used and so on. Capturing this information would be a big step forward, and it would not be hard to do. From what I can see all it appears to need is another stanza in the .changes file, recording the debian packages in the build-depends and essential, and the time. And of course storing the .changes file on the mirrors. This still wouldn't give us a reproducible binary of course, but it might be close enough in practice. Maybe enough provide some incentive to others to see what else is needed to mechanically verify a build under identical conditions produces packages that are for practical purposes identical. [4] Unpacking to a separate build directory neatly sidesteps several Debian nits. Firstly, you don't care if upstream has a debian directory. Secondly, there is no need for "debian/rules clean". That might again seem like me being lazy, and maybe it is, but sheesh, at least I ensure it works in everything I do. Running debuild in the same source directory has failed for me a surprisingly often, most recently with #765073. Running debuild, interrupting it, then running it again after a clean fails more times. Not surprisingly because it is near impossible to test interrupting the build process at every possible point. Sometimes it seems we go out of our way to make things harder for ourselves. [5] In reality it often gets more complex than these simple 5 steps. In particular it is not unusual to provide several variants of the package, with different patches applied. Exim4 does this. So real life is more complex than my idealised world, but it doesn't alter the underlying principles.
Attachment:
signature.asc
Description: This is a digitally signed message part