[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: A success story with apt and rsync



On Sun, Jul 06, 2003 at 10:28:07PM +0200, Koblinger Egmont wrote:
> 
> On Sun, 6 Jul 2003, Andrew Suffield wrote:
> 
> > It should put them in the package in the order they came from
> > readdir(), which will depend on the filesystem. This is normally the
> > order in which they were created, and should not vary when
> > rebuilding. As such, sorting the list probably doesn't change the
> > network traffic, but will slow dpkg-deb down on packages with large
> > directories in them.
> 
> Yes, when saying "random order" I obviously ment "in the order readdir()
> returns them". It's random for me.  :-)))
> 
> It can easily be different on different filesystems, or even on same
> type of filesystems with different parameters (e.g. blocksize).

I can't think of any reason why changing the blocksize would affect
this. Most filesystems return files in the sequence in which they were
added to the directory. ext2, ext3, and reiser all do this; xfs is the
only one likely to be used on a Debian system which doesn't.

> I even think it can be different after a simple rebuild on exactly the
> same environment. For example configure and libtool like to create files
> with the PID in their name, which can take from 3 to 5 digits. If you
> create the file X and then Y, remove X and then create Z then it is most
> likely that if Z's name is shorter than or equal to the length of filename
> X, then it will be returned first by readdir(), while if its name is
> longer, then Y will be returned first and Z afterwards. So I can imagine
> situations where the order of the files depend on the PIDs of the build
> processes.

This lengthly bit of handwaving has no connection with reality.

> However, I think sorting the files costs really nothing. My system is not
> a very new one, 375MHz Celeron, IDE disks, 384MB RAM etc... However:
> 
> /usr/lib$ du -s .
> 1,1G    .
> /usr/lib$ find . -type f | wc -l  # okay, it's now in memory cache
>   18598
> /usr/lib$ time find . >/dev/null 2>&1
> 
> real    0m0.285s
> user    0m0.100s
> sys     0m0.150s
> egmont@boci:/usr/lib$ time sortdir find . >/dev/null 2>&1
> 
> real    0m1.683s
> user    0m1.390s
> sys     0m0.250s
> 
> 
> IMHO a step which takes one and a half seconds before compressing 18000
> files of more than 1 gigabytes shouldn't be a problem.

This test only shows that you don't understand what is going on; it
has no relation to the problems that can occur.

On ext2, as an example, stat()ting or open()ing a directory of 10000
files in the order returned by readdir() will be vastly quicker than
in some other sequence (like, say, bytewise lexicographic) due to the
way in which the filesystem looks up inodes. This has caused
significant performance issues for bugs.debian.org in the past.

-- 
  .''`.  ** Debian GNU/Linux ** | Andrew Suffield
 : :' :  http://www.debian.org/ | Dept. of Computing,
 `. `'                          | Imperial College,
   `-             -><-          | London, UK

Attachment: pgpDSBaf1bK_b.pgp
Description: PGP signature


Reply to: