[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

File copy method that is twice as fast as "cp -a".




 I found a way of copying files from one drive to another that is
 signifigantly faster than "cp -a"...  (this is just the sort of
 geeky++ type stuff you guys like to read, I bet.)

 See if you can follow along here and see what I did.  The
 "cvs.gnome.org" directory contains a checkout of the "gnome" and
 "CVSROOT" modules only.

root@karl:~
# du -hs /usr/local/src/cvs.gnome.org 
260M    /usr/local/src/cvs.gnome.org

 First, time the "cp -a".

root@karl:~
# time cp -a /usr/local/src/cvs.gnome.org /mnt/tmp/src
cp -a /usr/local/src/cvs.gnome.org /mnt/tmp/src  0.37s user 13.81s system 27% cpu 51.674 total

 Now let's try using "tar" commands.

root@karl:~
# time (cd /usr/local/src/ && tar pcf - cvs.gnome.org) | (cd /mnt/tmp/src/ && tar pxf -)
( cd /usr/local/src/ && tar pcf - cvs.gnome.org )  0.77s user 8.02s system 16% cpu 53.759 total
( cd /mnt/tmp/src/ && tar pxf - )  0.68s user 12.58s system 24% cpu 53.757 total

 Hmmm.  That took slightly longer.  Let's try "cpio".

# time (cd /usr/local/src/ && find cvs.gnome.org -print0 | cpio -p0 /mnt/tmp/src)  
387800 blocks
( cd /usr/local/src/ && find cvs.gnome.org -print0 | cpio -p0 /mnt/tmp/src )  0.62s user 20.14s system 33% cpu 1:01.40 total
root@karl:~
# rm -rf /mnt/tmp/cvs.gnome.org

 That was a lot slower.  Both "find" and "cpio" must stat every file.
 There is no benefit to having two processes at work here.

 Let's try something else.  I seem to recall seeing some kind of
 buffering program meant for use when copying things across the
 network or to a tape drive using "tar", one time when I ran "dselect"
 and browsed the great plethora of available software packages...  A
 quick "apt-cache search 'buffer'" gives me a 92 line list, from which
 I choose the one I need:

root@karl:~
# apt-get install 'buffer'
Reading Package Lists... Done
Building Dependency Tree... Done
The following NEW packages will be installed:
  buffer
0 packages upgraded, 1 newly installed, 0 to remove and 3  not upgraded.
Need to get 12.6kB of archives. After unpacking 77.8kB will be used.
Get:1 http://zeus.kernel.org unstable/main buffer 1.19-1 [12.6kB]
Fetched 12.6kB in 0s (17.9kB/s)
Selecting previously deselected package buffer.
(Reading database ... 189510 files and directories currently installed.)
Unpacking buffer (from .../buffer_1.19-1_i386.deb) ...
Setting up buffer (1.19-1) ...

root@karl:~
# buffer --help
buffer: invalid option -- -
Usage: buffer [-B] [-t] [-S size] [-m memsize] [-b blocks] [-p percent] [-s blocksize] [-u pause] [-i infile] [-o outfile] [-z size]
-B = blocked device - pad out last block
-t = show total amount written at end
-S size = show amount written every size bytes
-m size = size of shared mem chunk to grab
-b num = number of blocks in queue
-p percent = don't start writing until percent blocks filled
-s size = size of a block
-u usecs = microseconds to sleep after each write
-i infile = file to read from
-o outfile = file to write to
-z size = combined -S/-s flag

 Ok, let's try it...

root@karl:~
# time (cd /usr/local/src/ && tar pcf - cvs.gnome.org) | buffer -m 8m | (cd /mnt/tmp/src/ && tar pxf -)
( cd /usr/local/src/ && tar pcf - cvs.gnome.org )  0.55s user 6.11s system 12% cpu 53.905 total
buffer -m 8m  0.14s user 2.10s system 4% cpu 53.914 total
( cd /mnt/tmp/src/ && tar pxf - )  0.84s user 16.51s system 32% cpu 53.910 total
root@karl:~
# rm -rf /mnt/tmp/src/cvs.gnome.org                                                                           
root@karl:~
# time (cd /usr/local/src/ && tar pcf - cvs.gnome.org) | buffer -m 8m -p 75 | (cd /mnt/tmp/src/ && tar pxf -) 
( cd /usr/local/src/ && tar pcf - cvs.gnome.org )  0.72s user 3.82s system 11% cpu 39.447 total
buffer -m 8m -p 75  0.15s user 2.39s system 6% cpu 39.544 total
( cd /mnt/tmp/src/ && tar pxf - )  0.59s user 12.07s system 32% cpu 39.539 total

 Wow!  Not bad, huh?


Filesystem            Size  Used Avail Use% Mounted on
/dev/ide/host0/bus0/target0/lun0/part3
                       27G   16G   11G  59% /
/dev/ide/host0/bus0/target0/lun0/part1
                       29M  6.3M   21M  23% /boot
shm                   2.8G     0  2.8G   0% /var/shm
/dev/md/0              55G  234M   55G   1% /mnt/tmp


 YMMV, since:

# hdparm -t /dev/hda3 /dev/md/0

/dev/hda3:
 Timing buffered disk reads:  64 MB in  2.50 seconds = 25.60 MB/sec

/dev/md/0:
 Timing buffered disk reads:  64 MB in  1.10 seconds = 58.18 MB/sec

 ... the RAID0 (software raid 0 on UDMA 100 EIDE) destination is much
 faster than the source filesystem.  That is why filling the buffer
 before starting to write helped the timing so much.  In this case,
 having more than one process at work is beneficial.

 The situation between the "find | cpio" case and the "tar c | buffer
 | tar x" case seems analagous to what we do in that if you just point
 out the bugs, it takes longer for them to get fixed than if you
 submit a patch.  Can you see what I mean by that?  In "find | cpio",
 "find" is just walking the filesystem handing file names off to
 "cpio" who must then stat and read each file itself, and then also
 write it back out to the new location.  In the "tar c | buffer | tar
 x" case though, the "tar c" is making its own list of files, then
 packing them up and piping the whole bundle off to the buffer (our
 BTS?), where it is then ready to be unpacked by the "tar x".  Hmmm.

 "cpio" doesn't know how to find, it just knows how to archive or copy
 through...  Many of you don't know how to fix the code when you find
 a bug, yet.  Nor do I.  Often enough it's way over my head.  Often
 enough the BTS already contains a report about the bug I just found.

 :-) It's late and I'm rambling and I don't feel like editting this
 story any longer.  Just thought I'd share my findings.  Hope it
 helps someone.

-- 
Karl M. Hegbloom
mailto: karlheg@hegbloom.net



Reply to: