[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: My local debian archive maintenance scripts



On Thursday 12 May 2005 02:30 pm, Marty wrote:
> As the debian archive grows larger, it gets increasingly laborious
> and time consuming to keep my local debian archive up-to-date.  Here
> are my latest scripts for automating the process (including some
> remaining manual steps).
>
> I'm sure there are better ways to do it, which is one of my reasons
> for posting them here.  In particular, I am interested in exploring
> the use the debmirror to r.eplace rsync in these scripts, although I'm
> not familiar enough with it yet to know how well that might work.

quite a lot of script :-) Using 'debmirror' will put a full Debian 
mirror into your home directory, not what you seem to want but an 
example. I am not sure if 'debmirror' works with non-official sites but 
it does a good job mirroring the Debian archive, and you can limit the 
arch's and versions easily, not end up with 100+ gb :-)

>
> (Note: these scripts work for x86 debian archives, and will need to
> be modified accordingly for other architectures.  In addition, there
> are probably more elegant ways to do these tasks, by consolidating
> them instead of using several small scripts. Any proposals are
> welcome.)
>
> For planning purposes, here is the disk space used by my debian
> archives as of May 12, 2005:
>
> indio:/mnt/install# du -sc debian debian-security debian-non-US
> debian-marillat 37373196	debian
> 1907560	debian-security
> 267864	debian-non-US
> 1277720	debian-marillat
> 40826340	total
>
> I have a script called debian-all which rsyncs all the debian
> archives into a holding archive at /mnt/install/debian[-*] (*=blank
> (main), security, non-US, or marillat)
>
> contents of /mnt/install/test/debian-all:
> #!/bin/sh
> LOOP=1
> while [ "$LOOP" = 1 ]
> do
>     if rsync -vaHD --numeric-ids  --delete --delete-excluded
> --exclude '*ia64*' --exclude '*_arm*' --exclude '*_alpha*' --exclude
> '*-arm*' --exclude '*-alpha*' --exclude '*powerpc*' --exclude
> '*mipsel*' --exclude '*hppa*' --exclude '*m68k*' --exclude '*mips*'
> --exclude '*sparc*' --exclude '*s390*' --exclude '*hurd*' --exclude
> '*UploadQueue*' rsync://ftp.debian.org/debian/ /mnt/install/debian/
>
>     then LOOP=0
>     else
>      echo rsync error: trying debian main again
>      sleep 10
>     fi
> done
>
> echo
> echo
>
> #do debian main again to fill in any hardlink targets missed
> #the first time 'round
>
> LOOP=1
> while [ "$LOOP" = 1 ]
> do
>     if rsync -vaHD --numeric-ids  --delete --delete-excluded
> --exclude '*ia64*' --exclude '*_arm*' --exclude '*_alpha*' --exclude
> '*-arm*' --exclude '*-alpha*' --exclude '*powerpc*' --exclude
> '*mipsel*' --exclude '*hppa*' --exclude '*m68k*' --exclude '*mips*'
> --exclude '*sparc*' --exclude '*s390*' --exclude '*hurd*' --exclude
> '*UploadQueue*' rsync://ftp.debian.org/debian/ /mnt/install/debian/
>
>     then LOOP=0
>     else
>      echo rsync error: trying debian main again
>      sleep 10
>     fi
> done
>
> echo
> echo
>
> LOOP=1
> while [ "$LOOP" = 1 ]
> do
>
>     if rsync -vaHD --numeric-ids  --delete --delete-excluded
> --exclude '*ia64*' --exclude '*_arm*' --exclude '*_alpha*' --exclude
> '*-arm*' --exclude '*-alpha*' --exclude '*powerpc*' --exclude
> '*mipsel*' --exclude '*hppa*' --exclude '*m68k*' --exclude '*mips*'
> --exclude '*sparc*' --exclude '*s390*' --exclude '*hurd*' --exclude
> '*UploadQueue*' --exclude 'oldstable' --exclude 'potato' --exclude
> 'slink' rsync://non-us.debian.org/debian-non-US/
> /mnt/install/debian-non-US/ then LOOP=0
>     else
>      echo rsync error: trying debian-non-US again
>      sleep 10
>     fi
> done
>
> echo
> echo
>
> LOOP=1
> while [ "$LOOP" = 1 ]
> do
>
>     if rsync -vaHD --numeric-ids  --delete --delete-excluded
> --exclude '*ia64*' --exclude '*_arm*' --exclude '*_alpha*' --exclude
> '*-arm*' --exclude '*-alpha*' --exclude '*powerpc*' --exclude
> '*mipsel*' --exclude '*hppa*' --exclude '*m68k*' --exclude '*mips*'
> --exclude '*sparc*' --exclude '*s390*' --exclude '*hurd*' --exclude
> '*UploadQueue*' --exclude 'oldstable' --exclude 'potato' --exclude
> 'slink' rsync://security.debian.org/debian-security/
> /mnt/install/debian-security/ then LOOP=0
>     else
>      echo rsync error: trying debian-security again
>      sleep 10
>     fi
> done
>
> echo
> echo
>
> #rsync debian-security again, this time with checksums (-c option),
> because there is #no indices file there containing md5sums to check
> file integrity with after the fact. #Note: update with checksums only
> after first updating without them, because #this server has a strong
> tendency to give time out errors during large transfers
>
> LOOP=1
> while [ "$LOOP" = 1 ]
> do
>
>     if rsync -vcaHD --numeric-ids  --delete --delete-excluded
> --exclude '*ia64*' --exclude '*_arm*' --exclude '*_alpha*' --exclude
> '*-arm*' --exclude '*-alpha*' --exclude '*powerpc*' --exclude
> '*mipsel*' --exclude '*hppa*' --exclude '*m68k*' --exclude '*mips*'
> --exclude '*sparc*' --exclude '*s390*' --exclude '*hurd*' --exclude
> '*UploadQueue*' --exclude 'oldstable' --exclude 'potato' --exclude
> 'slink' rsync://security.debian.org/debian-security/
> /mnt/install/debian-security/ then LOOP=0
>     else
>      echo "rsync error: trying debian-security (w/csums) again"
>      sleep 10
>     fi
> done
>
> echo
> echo
>
> wget -nv -r ftp://ftp.nerim.net/debian-marillat/ -nH -N -P
> /mnt/install
>
> #end of debian-all
>
> (Note: At first I thought that the "if rsync .." statements should be
> "if ! rsync ..." instead, but that logic doesn't seem to work, for
> reasons that are unclear to me.)
>
> ---------------------------------------------------------------------
>--------------------
>
> I set up my holding archive server to boot up by RTC alarm at 6:10am.
>  This allows enough time to fsck any disks prior to the cron.daily
> wake-up time at 6:25am.  In the directory /etc/cron.daily I placed a
> script named udpate-debian.
>
> contents of /etc/cron.daily/update-debian:
> #!/bin/sh
> LOGFILE=/var/tmp/update-debian.log
> /mnt/install/test/debian-all >$LOGFILE 2>&1
> echo >>$LOGFILE
> echo update-debian cron script >>$LOGFILE
> echo finished with archive update at `date` >>$LOGFILE
> mail -s "update-debian.log for `date`" [your-email-address@goes-here]
> </var/tmp/update-debian.log while true
> do
> 	#if updatedb is still running, wait for it to finish before shutting
> down pidof updateb && echo waiting for updatedb to finish at `date`
> >>$LOGFILE || shutdown -h now echo >>$LOGFILE
> 	sleep 300
> done
>
> ---------------------------------------------------------------------
>-----------------
>
> The remaining steps could also be automated, but for now I prefer to
> do them manually for now.
>
> In order to simplify checking with debsums, I keep a directory named
> /mnt/install/deblinks, containing hard links to all the .deb files in
> the local archive.  To update this directory, I first do "rm
> /mnt/install/deblinks.old;mv /mnt/install/deblinks
> /mnt/install/deblinks.old;mkdir /mnt/install/deblinks".
>
> Now I am ready to put new .deb hardlinks in the /mnt/install/deblinks
> directory, using the following scripts:
>
> contents of /mnt/install/test/make-deblinks:
> #!/bin/sh
> find /mnt/install/debian* -regex .*\\.deb$ |
> /mnt/install/test/deblink-loop
>
> contents of mnt/install/test/deblink-loop:
> #!/bin/sh
> cd /mnt/install/deblinks
> while read filepath
> do
> 	file=`echo $filepath | sed 's/.*\///'`
> 	ln $filepath $file
> done
> ---------------------------------------------------------------------
>---------------
>
> Next I check for duplicate .deb files with differing md5 checksums
> using the following scripts:
>
> contents of /mnt/install/test/diff-dupes:
> #!/bin/sh
> /mnt/install/test/deb-dupes | /mnt/install/test/check-dupes
>
> contents of /mnt/install/test/deb-dupes:
> #!/bin/sh
> find /mnt/install/debian* -regex .*\\.deb$ -printf %f\\n |sort|uniq
> -d
>
> contents of /mnt/install/test/check-dupes:
> #!/bin/sh
> while read file
> do
> 	find /mnt/install/debian*  -name $file |xargs md5sum|sort|uniq -uW1
> done
>
> ---------------------------------------------------------------------
>-------------
>
> The output of diff-dupes is a handful of .debs with their respective
> md5 checksums, which for some unknown reason have multiple versions
> in the archives.  To be on the safe side, I check to make sure none
> of these packages is installed on any of my systems.  If I find one
> installed, I remove it immediately, assuming it's either a trojan or
> corrupted package.
>
> Once my holding archives are updated, I rsync the archives to another
> system serving as my working debian archive server, and this is the
> server most often used by local systems both to update their packages
> and run debsums against.  The purpose of the duplicate archive is to
> ensure that I have a valid archive at all times even while one copy
> is being updated, and also in case one of the archive drives fails. 
> (Even with DSL it would take many days to restore the lost debian
> archives.)
>
> To update the working debian archive, I use the following script:
>
> contents of script /mnt/install/test/copy-debian-archive:
> #!/bin/sh
> rsync -vaH --rsh=ssh --numeric-ids  --delete /mnt/install/debian/    
>       root@ibex:/mnt/install/debian/ rsync -vaH --rsh=ssh
> --numeric-ids  --delete /mnt/install/debian-non-US/   
> root@ibex:/mnt/install/debian-non-US/ rsync -vacH --rsh=ssh
> --numeric-ids  --delete /mnt/install/debian-security/
> root@ibex:/mnt/install/debian-security/ rsync -vacH --rsh=ssh
> --numeric-ids  --delete /mnt/install/debian-marillat/
> root@ibex:/mnt/install/debian-marillat/
>
> Notes: ibex is the hostname of my working debian archive server. 
> debian-security and debian-marillat are again transfered with
> checksumming, because they don't have indices (md5sum) files to check
> them with after the fact.  I run this script twice to fill in any
> missing hardlink targets missed on the first run.
>
> The working debian archive server ibex also has copies of my scripts
> for making the .deb hardlinks, which I run and then diff the
> hardlinks over nfs as a double check.  At this point I know that my
> two debian archive servers are identical and can be used
> interchangeably.
>
> ---------------------------------------------------------------------
>---------------------
>
> Before using the updated debian archives to update my local hosts, I
> check each .deb in the working debian archives using the following
> scripts:
>
> contents of script /mnt/install/test/check-debian-archives:
> #!/bin/sh
> ./check-debian ../debian indices
> ./check-debian ../debian-non-US indices-non-US
>
> contents of script /mnt/install/test/check-debian
> #!/bin/sh
> cat $1/$2/md5sums.gz |egrep -v
> '_arm|\-arm|\_alpha|\-alpha|hurd|powerpc|m68k|hppa|mips|mipsel|sparc|
>ia64|s390|potato|slink'| /mnt/install/test/md5chk $1
>
> contents of script /mnt/install/test/md5chk:
> cd $1
> while read md5 filep
> do
> 	#echo "$md5  $filep"
> 	filepath=`echo "$filep" | sed 's/"//g'`
>
> 	if [ -h "$filepath" ]
> 	then
> 		testmd5="00000000000000000000000000000000"
> 	else
> 	if [ -f "$filepath" ]
> 	then
>                  testmd5=`md5sum "$filepath" | (read md5 filepath;
> echo $md5;)` else
> 	if [ -e "$filepath" ]
> 	then
> 		testmd5="00000000000000000000000000000001"
> 	else
> 		echo "$filepath" not found | tee >&2
> 		echo
> 		continue
> 	fi
> 	fi
> 	fi
>
> 	if [ "$testmd5" != "$md5" ]
> 	then
> 		echo "$filepath md5sums don't match" | tee >&2
> 		echo "orig md5sum= $md5" | tee >&2
> 		echo "test md5sum= $testmd5" | tee >&2
> 		echo
> 	fi
> 	#echo "$testmd5  $filepath found"
> done
>
>
> Notes: In this script I assign arbitrary md5sums of 0 or 1
> respectively to non-standard files or directories, which I use with
> other scripts to generate and test my own md5sum files on various
> archives.  Since only debian main and debian-non-US have md5 indices
> files, they are the only archives checked here.  The others have been
> transfered with checksumming enabled, as noted above.
>
> ---------------------------------------------------------------------
>---------------------
>
> Next I update each of my local hosts over nfs, with "apt-get update"
> following by the update option of dselect using apt (not sure if
> that's necessary), followed by the select and install options of
> dselect to install or update any new packages.  (I could use aptitude
> but I don't trust it yet.)  Finally I run a debsums script on all of
> the local hosts to validate each file of any installed packages.
>
> Each of my local systems has a copy of the following script to run
> debsums against a local debian archive:
>
> contents of script check-debsums:
> #!/bin/sh
> debsums -ca --generate=all --deb-path=/mnt/$1/install/deblinks
>
> The single argument is the name of one of my debian archive servers,
> either indio or ibex (holding or working archives, respectively.)
> with its debian archives mounted via nfs.  This script regenerates
> the md5sums "on the fly" using the newly checked and updated debian
> archives, which increases my assurance of installed package
> integrity.
>
> By running these scripts once or twice per week, I not only keep my
> systems up-to-date, but minimize the chance of a corrupted package
> remaining undetected on my systems.  The only weak security link I
> can see is if someone were to trojan my debsums perl script.  If I
> were more security conscious I could periodically boot a rescue
> floppy on each of my hosts and manually verify the md5sum of the
> debsums script.  Any comments or suggestions are welcome.

-- 
Greg C. Madden



Reply to: