Re: A success story with apt and rsync
On Sun, 6 Jul 2003, Andrew Suffield wrote:
> On ext2, as an example, stat()ting or open()ing a directory of 10000
> files in the order returned by readdir() will be vastly quicker than
> in some other sequence (like, say, bytewise lexicographic) due to the
> way in which the filesystem looks up inodes. This has caused
> significant performance issues for bugs.debian.org in the past.
You're right, I didn't get the point in the story when I simply ran find
using the sortdir wrapper, but now I understand the problem.
However I'm still unsure if this good to keep files unsorted, especially
if we consider effective syncing of packages. On my home computer I've
never heard the sound of my disk at package creating phase (even though
we've beein using sortdir for more than a half year, and I've compiled
hundreds of packages), but I hear it when e.g. the source is decompressed.
At the 'dpkg-deb --build' phase only the processor is the bottleneck.
This might vary under different circumstances. I'm unaware of them in case
of Debian, e.g. I have no information about what hardware your packages
are created on, whether there are any other cpu-intensive or
disk-intensive applications running on these machines etc. I can easily
imagine that using sortdir can drastically decrease performance if another
disk-intensive process is running. However my experiences didn't show a
noticeable performance decrease if this was the only process accessing the
But hey, let's stop for a minute :-) Building the package only uses the
memory cache for most of the packages, doesn't it? The files it packs
together have just recently been created and there are not so many
packages whose uncompressed size is close to or bigger than the amount of
RAM in today's machines...
And for the large packages the build itself might take thousands as much
time as reading the files in sorted order.
Does anyone know what RPM does? I know that listing the contents of a
package always produces alphabetical order but I don't know whether the
filelist is sorted on the fly or the files really appear alphabetically in
the cpio archive.
So I guess we've already seen pros and cons of sorting the files. (One
thing is missing: we still don't know how efficient rsync is if two
rsyncable tar.gz files contain the same files but in different order.)
The decision is clearly not mine but the Debian developers'. However, if
you ask me, I still vote for sorting the files :-))