[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

New image size estimation approach for debian-cd



Hi guys,

After some discussions with Steve (Sledge) and Thomas Schmitt (xorriso 
upstream) a new approach was suggested to perform the image size estimation 
task within debian-cd (idea originally suggested by Thomas).

Currently it all boils down to forking genisoimage -print-size multiple times 
(as much tries as it gets) as the addition algorithm approaches close to the 
media end (5-10% of configured media size). As already known, spawning 
genisoimage from the ground up to perform such size estimation tasks on 
relatively large data sets is quite expensive.

This could be alleviated by initiating a session with xorriso in dialog mode, 
and talk to it via stdin+stdout (forked just once until the real end of media 
is reached). File objects could be mapped into the ISO model, and several 
estimation algorithms applied, eventually followed final -print_size calls to 
achieve exactness of the estimated image size.

To explore that, a proof of concept tool has been started at:

http://git.debian.org/?p=users/danchev/medistimator.git;a=summary
(see 'algo' branch)

Some (perhaps naive) timing results could be found at:
http://people.debian.org/~danchev/medistimator/log/

Whether this tool will be released or not is not of great importance, it main 
objective is to give impression about:

* Impact of aggressively using (or resp. avoiding) -print_size command as 
compared to alternative estimation techniques which do not rely on expensive 
operations on the back-end side (xorriso in that case)
* How reliable is the interaction between xorriso dialog mode and perl's 
IPC::Run.

My findings so far, reveal positive results, or at least I think so.

Three estimation algorithms are explored:

* swift - xorriso is only used to calculate ISO image overhead,
          the rest is a self-made size estimation of the input
          data objects (files, directories, etc).
          Approximate (never overruns), but fastest.

* psize - relies solely on xorriso -print_size command to perform
          size calculations, which is expensive and slow. It is
          included mainly for comparison purposes.
          Accurate, but very slow.

* mixed - Employes both of the above algorithms, swift for speed
          and prize for exactness. The general idea is to use 'swift'
          as long as we are not close to the media end, and fall back
          to 'psize' when it is time to be precise.
          Accurate and fast (default);


If such an approach is found to be beneficial for debian-cd [1] job, then we 
can start discussing how to transplant code blocks from the proof of concept 
tool into debian-cd scripts. This is mainly a set of less then ten routines 
implementing xorriso communication layer, which performs the queries and 
processes returned results.

Maybe it is worth to introduce a divergence from debian-cd Makefile, and create 
an alternative target which calls modified make_disc_trees.pl which in turn is 
based on xorriso communication for image size estimations, so that the old 
approach remains too. My idea is to only give alternative to the  size 
measuring approach, not to change the core logic behind how the debian image 
trees are laid.

[1] I'm not aware of any other vendor here on Earth, producing such an insane 
amount of images on a weekly basis, like Debian does.

-- 
pub 4096R/0E4BD0AB <people.fccf.net/danchev/key pgp.mit.edu>


Reply to: