[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Rebuilds with unexpected timestamps



How much effort is it to do an archive rebuild nowadays ?  How studly
a computer (or computers?) do I need.  Could someone point me at some
tools, or volunteer to help, or something ?

I ask because have found a new way to break packages :-).


Most of our packages use `make' or something like it.  make relies on
timestamps to decide what to rebuild.  It seems that sometimes our
source packages contain combinations of timestamps (and perhaps stamp
files) which, in practice, exempt certain parts of the build from
taking place (if one just does "apt-get source" and then
"dpkg-buildpackage -uc -b").

This doesn't seem desirable.  I think we can detect situations like
this by the following procedure:
 * obtain the source code (apt-get or dgit clone)
 * obtain a list of the files and directories in the source tree
 * assign a distinct synthetic timestamp to each file and directory
 * run the build
 * clear everything away, and then do the same again, but:
 * with exactly the opposite series of timestamps

I think this would in practice cause almost every timestamp-dependent
makefile rule to decide that rebuild was needed.

This procedure would detect FTBFS bugs masked by timestamps.
(See #842452 for an example.)


To detect all timestamp-dependent build anomalies, it would be
necessary to run the package clean target before the build.  This is
because most of our source formats cannot represent the deletion of
files (which would be the natural way to force a rebuild), so in
practice if a package as provided by upstream has both input and
output/intermediate files, the Debian maintainer needs to remove the
non-input files in the clean target.

The procedure would be something like that above, only we would
collect the two build logs, sort each one with sort, and compare them.

I think that would probably produce relatively few false positives but
I don't really know.  We'd probably need to use faketime to avoid
being thrown off by timestamps in the log.


I think the synthetic timestamps could be computed by something like:
 * compute, for each filename,
    H( filename | package version | "deterministic seed" )
 * sort the list
    H(...), filename
   by H.
 * assign the first file in the list time()-2, the second time()-4,
   the third time()-6, and so on.


What do people think ?

If this seems like a good idea, I have effort to write the code to do
the weird parts above.  But I don't have much experience of
archive-wide rebuilds and could use some hints/help/whatever.

Ian.

-- 
Ian Jackson <ijackson@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Reply to: