New image size estimation approach for debian-cd
Hi guys,
After some discussions with Steve (Sledge) and Thomas Schmitt (xorriso
upstream) a new approach was suggested to perform the image size estimation
task within debian-cd (idea originally suggested by Thomas).
Currently it all boils down to forking genisoimage -print-size multiple times
(as much tries as it gets) as the addition algorithm approaches close to the
media end (5-10% of configured media size). As already known, spawning
genisoimage from the ground up to perform such size estimation tasks on
relatively large data sets is quite expensive.
This could be alleviated by initiating a session with xorriso in dialog mode,
and talk to it via stdin+stdout (forked just once until the real end of media
is reached). File objects could be mapped into the ISO model, and several
estimation algorithms applied, eventually followed final -print_size calls to
achieve exactness of the estimated image size.
To explore that, a proof of concept tool has been started at:
http://git.debian.org/?p=users/danchev/medistimator.git;a=summary
(see 'algo' branch)
Some (perhaps naive) timing results could be found at:
http://people.debian.org/~danchev/medistimator/log/
Whether this tool will be released or not is not of great importance, it main
objective is to give impression about:
* Impact of aggressively using (or resp. avoiding) -print_size command as
compared to alternative estimation techniques which do not rely on expensive
operations on the back-end side (xorriso in that case)
* How reliable is the interaction between xorriso dialog mode and perl's
IPC::Run.
My findings so far, reveal positive results, or at least I think so.
Three estimation algorithms are explored:
* swift - xorriso is only used to calculate ISO image overhead,
the rest is a self-made size estimation of the input
data objects (files, directories, etc).
Approximate (never overruns), but fastest.
* psize - relies solely on xorriso -print_size command to perform
size calculations, which is expensive and slow. It is
included mainly for comparison purposes.
Accurate, but very slow.
* mixed - Employes both of the above algorithms, swift for speed
and prize for exactness. The general idea is to use 'swift'
as long as we are not close to the media end, and fall back
to 'psize' when it is time to be precise.
Accurate and fast (default);
If such an approach is found to be beneficial for debian-cd [1] job, then we
can start discussing how to transplant code blocks from the proof of concept
tool into debian-cd scripts. This is mainly a set of less then ten routines
implementing xorriso communication layer, which performs the queries and
processes returned results.
Maybe it is worth to introduce a divergence from debian-cd Makefile, and create
an alternative target which calls modified make_disc_trees.pl which in turn is
based on xorriso communication for image size estimations, so that the old
approach remains too. My idea is to only give alternative to the size
measuring approach, not to change the core logic behind how the debian image
trees are laid.
[1] I'm not aware of any other vendor here on Earth, producing such an insane
amount of images on a weekly basis, like Debian does.
--
pub 4096R/0E4BD0AB <people.fccf.net/danchev/key pgp.mit.edu>
Reply to: