[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

My local debian archive maintenance scripts



As the debian archive grows larger, it gets increasingly laborious and time
consuming to keep my local debian archive up-to-date.  Here are my latest
scripts for automating the process (including some remaining manual steps).

I'm sure there are better ways to do it, which is one of my reasons for
posting them here.  In particular, I am interested in exploring the use the
debmirror to replace rsync in these scripts, although I'm not familiar enough
with it yet to know how well that might work.

(Note: these scripts work for x86 debian archives, and will need to be modified
accordingly for other architectures.  In addition, there are probably more elegant
ways to do these tasks, by consolidating them instead of using several small scripts.
Any proposals are welcome.)

For planning purposes, here is the disk space used by my debian archives as of
May 12, 2005:

indio:/mnt/install# du -sc debian debian-security debian-non-US debian-marillat
37373196	debian
1907560	debian-security
267864	debian-non-US
1277720	debian-marillat
40826340	total

I have a script called debian-all which rsyncs all the debian archives into a
holding archive at /mnt/install/debian[-*] (*=blank (main), security, non-US, or
marillat)

contents of /mnt/install/test/debian-all:
#!/bin/sh
LOOP=1
while [ "$LOOP" = 1 ]
do
   if rsync -vaHD --numeric-ids  --delete --delete-excluded --exclude '*ia64*' --exclude '*_arm*' --exclude '*_alpha*' --exclude '*-arm*' --exclude '*-alpha*' --exclude '*powerpc*' --exclude '*mipsel*' --exclude '*hppa*' --exclude '*m68k*' --exclude '*mips*' --exclude '*sparc*' --exclude '*s390*' --exclude '*hurd*' --exclude '*UploadQueue*' rsync://ftp.debian.org/debian/ /mnt/install/debian/

   then LOOP=0
   else
    echo rsync error: trying debian main again
    sleep 10
   fi
done

echo
echo

#do debian main again to fill in any hardlink targets missed
#the first time 'round

LOOP=1
while [ "$LOOP" = 1 ]
do
   if rsync -vaHD --numeric-ids  --delete --delete-excluded --exclude '*ia64*' --exclude '*_arm*' --exclude '*_alpha*' --exclude '*-arm*' --exclude '*-alpha*' --exclude '*powerpc*' --exclude '*mipsel*' --exclude '*hppa*' --exclude '*m68k*' --exclude '*mips*' --exclude '*sparc*' --exclude '*s390*' --exclude '*hurd*' --exclude '*UploadQueue*' rsync://ftp.debian.org/debian/ /mnt/install/debian/

   then LOOP=0
   else
    echo rsync error: trying debian main again
    sleep 10
   fi
done

echo
echo

LOOP=1
while [ "$LOOP" = 1 ]
do

   if rsync -vaHD --numeric-ids  --delete --delete-excluded --exclude '*ia64*' --exclude '*_arm*' --exclude '*_alpha*' --exclude '*-arm*' --exclude '*-alpha*' --exclude '*powerpc*' --exclude '*mipsel*' --exclude '*hppa*' --exclude '*m68k*' --exclude '*mips*' --exclude '*sparc*' --exclude '*s390*' --exclude '*hurd*' --exclude '*UploadQueue*' --exclude 'oldstable' --exclude 'potato' --exclude 'slink' rsync://non-us.debian.org/debian-non-US/ /mnt/install/debian-non-US/
   then LOOP=0
   else
    echo rsync error: trying debian-non-US again
    sleep 10
   fi
done

echo
echo

LOOP=1
while [ "$LOOP" = 1 ]
do

   if rsync -vaHD --numeric-ids  --delete --delete-excluded --exclude '*ia64*' --exclude '*_arm*' --exclude '*_alpha*' --exclude '*-arm*' --exclude '*-alpha*' --exclude '*powerpc*' --exclude '*mipsel*' --exclude '*hppa*' --exclude '*m68k*' --exclude '*mips*' --exclude '*sparc*' --exclude '*s390*' --exclude '*hurd*' --exclude '*UploadQueue*' --exclude 'oldstable' --exclude 'potato' --exclude 'slink' rsync://security.debian.org/debian-security/ /mnt/install/debian-security/
   then LOOP=0
   else
    echo rsync error: trying debian-security again
    sleep 10
   fi
done

echo
echo

#rsync debian-security again, this time with checksums (-c option), because there is
#no indices file there containing md5sums to check file integrity with after the fact.
#Note: update with checksums only after first updating without them, because
#this server has a strong tendency to give time out errors during large transfers

LOOP=1
while [ "$LOOP" = 1 ]
do

   if rsync -vcaHD --numeric-ids  --delete --delete-excluded --exclude '*ia64*' --exclude '*_arm*' --exclude '*_alpha*' --exclude '*-arm*' --exclude '*-alpha*' --exclude '*powerpc*' --exclude '*mipsel*' --exclude '*hppa*' --exclude '*m68k*' --exclude '*mips*' --exclude '*sparc*' --exclude '*s390*' --exclude '*hurd*' --exclude '*UploadQueue*' --exclude 'oldstable' --exclude 'potato' --exclude 'slink' rsync://security.debian.org/debian-security/ /mnt/install/debian-security/
   then LOOP=0
   else
    echo "rsync error: trying debian-security (w/csums) again"
    sleep 10
   fi
done

echo
echo

wget -nv -r ftp://ftp.nerim.net/debian-marillat/ -nH -N -P /mnt/install

#end of debian-all

(Note: At first I thought that the "if rsync .." statements should be "if ! rsync ..." instead,
but that logic doesn't seem to work, for reasons that are unclear to me.)

-----------------------------------------------------------------------------------------

I set up my holding archive server to boot up by RTC alarm at 6:10am.  This allows
enough time to fsck any disks prior to the cron.daily wake-up time at 6:25am.  In the
directory /etc/cron.daily I placed a script named udpate-debian.

contents of /etc/cron.daily/update-debian:
#!/bin/sh
LOGFILE=/var/tmp/update-debian.log
/mnt/install/test/debian-all >$LOGFILE 2>&1
echo >>$LOGFILE
echo update-debian cron script >>$LOGFILE
echo finished with archive update at `date` >>$LOGFILE
mail -s "update-debian.log for `date`" [your-email-address@goes-here] </var/tmp/update-debian.log
while true
do
	#if updatedb is still running, wait for it to finish before shutting down
	pidof updateb && echo waiting for updatedb to finish at `date` >>$LOGFILE || shutdown -h now
	echo >>$LOGFILE
	sleep 300
done

--------------------------------------------------------------------------------------

The remaining steps could also be automated, but for now I prefer to do them manually for now.

In order to simplify checking with debsums, I keep a directory named /mnt/install/deblinks,
containing hard links to all the .deb files in the local archive.  To update this directory,
I first do "rm /mnt/install/deblinks.old;mv /mnt/install/deblinks /mnt/install/deblinks.old;mkdir /mnt/install/deblinks".

Now I am ready to put new .deb hardlinks in the /mnt/install/deblinks directory, using
the following scripts:

contents of /mnt/install/test/make-deblinks:
#!/bin/sh
find /mnt/install/debian* -regex .*\\.deb$ | /mnt/install/test/deblink-loop

contents of mnt/install/test/deblink-loop:
#!/bin/sh
cd /mnt/install/deblinks
while read filepath
do
	file=`echo $filepath | sed 's/.*\///'`
	ln $filepath $file
done
------------------------------------------------------------------------------------

Next I check for duplicate .deb files with differing md5 checksums using the
following scripts:

contents of /mnt/install/test/diff-dupes:
#!/bin/sh
/mnt/install/test/deb-dupes | /mnt/install/test/check-dupes

contents of /mnt/install/test/deb-dupes:
#!/bin/sh
find /mnt/install/debian* -regex .*\\.deb$ -printf %f\\n |sort|uniq -d

contents of /mnt/install/test/check-dupes:
#!/bin/sh
while read file
do
	find /mnt/install/debian*  -name $file |xargs md5sum|sort|uniq -uW1
done

----------------------------------------------------------------------------------

The output of diff-dupes is a handful of .debs with their respective md5 checksums, which
for some unknown reason have multiple versions in the archives.  To be on the safe side,
I check to make sure none of these packages is installed on any of my systems.  If I
find one installed, I remove it immediately, assuming it's either a trojan or corrupted package.

Once my holding archives are updated, I rsync the archives to another system serving as my working
debian archive server, and this is the server most often used by local systems both to update their
packages and run debsums against.  The purpose of the duplicate archive is to ensure that I have a
valid archive at all times even while one copy is being updated, and also in case one of the archive
drives fails.  (Even with DSL it would take many days to restore the lost debian archives.)

To update the working debian archive, I use the following script:

contents of script /mnt/install/test/copy-debian-archive:
#!/bin/sh
rsync -vaH --rsh=ssh --numeric-ids  --delete /mnt/install/debian/           root@ibex:/mnt/install/debian/
rsync -vaH --rsh=ssh --numeric-ids  --delete /mnt/install/debian-non-US/    root@ibex:/mnt/install/debian-non-US/
rsync -vacH --rsh=ssh --numeric-ids  --delete /mnt/install/debian-security/ root@ibex:/mnt/install/debian-security/
rsync -vacH --rsh=ssh --numeric-ids  --delete /mnt/install/debian-marillat/ root@ibex:/mnt/install/debian-marillat/

Notes: ibex is the hostname of my working debian archive server.  debian-security and debian-marillat
are again transfered with checksumming, because they don't have indices (md5sum) files to check them
with after the fact.  I run this script twice to fill in any missing hardlink targets missed on the
first run.

The working debian archive server ibex also has copies of my scripts for making the .deb
hardlinks, which I run and then diff the hardlinks over nfs as a double check.  At this
point I know that my two debian archive servers are identical and can be used interchangeably.

------------------------------------------------------------------------------------------

Before using the updated debian archives to update my local hosts, I check each .deb in the
working debian archives using the following scripts:

contents of script /mnt/install/test/check-debian-archives:
#!/bin/sh
./check-debian ../debian indices
./check-debian ../debian-non-US indices-non-US

contents of script /mnt/install/test/check-debian
#!/bin/sh
cat $1/$2/md5sums.gz |egrep -v '_arm|\-arm|\_alpha|\-alpha|hurd|powerpc|m68k|hppa|mips|mipsel|sparc|ia64|s390|potato|slink'| /mnt/install/test/md5chk $1

contents of script /mnt/install/test/md5chk:
cd $1
while read md5 filep
do
	#echo "$md5  $filep"
	filepath=`echo "$filep" | sed 's/"//g'`

	if [ -h "$filepath" ]
	then
		testmd5="00000000000000000000000000000000"
	else
	if [ -f "$filepath" ]
	then
                testmd5=`md5sum "$filepath" | (read md5 filepath; echo $md5;)`
	else
	if [ -e "$filepath" ]
	then
		testmd5="00000000000000000000000000000001"
	else
		echo "$filepath" not found | tee >&2
		echo
		continue
	fi
	fi
	fi

	if [ "$testmd5" != "$md5" ]
	then
		echo "$filepath md5sums don't match" | tee >&2
		echo "orig md5sum= $md5" | tee >&2
		echo "test md5sum= $testmd5" | tee >&2
		echo
	fi
	#echo "$testmd5  $filepath found"
done


Notes: In this script I assign arbitrary md5sums of 0 or 1 respectively to non-standard files
or directories, which I use with other scripts to generate and test my own md5sum files on
various archives.  Since only debian main and debian-non-US have md5 indices files, they are
the only archives checked here.  The others have been transfered with checksumming enabled, as
noted above.

------------------------------------------------------------------------------------------

Next I update each of my local hosts over nfs, with "apt-get update" following by the update option
of dselect using apt (not sure if that's necessary), followed by the select and install options of
dselect to install or update any new packages.  (I could use aptitude but I don't trust it yet.)  Finally
I run a debsums script on all of the local hosts to validate each file of any installed packages.

Each of my local systems has a copy of the following script to run debsums against a local debian archive:

contents of script check-debsums:
#!/bin/sh
debsums -ca --generate=all --deb-path=/mnt/$1/install/deblinks

The single argument is the name of one of my debian archive servers, either indio or ibex (holding or
working archives, respectively.) with its debian archives mounted via nfs.  This script regenerates
the md5sums "on the fly" using the newly checked and updated debian archives, which increases my
assurance of installed package integrity.

By running these scripts once or twice per week, I not only keep my systems up-to-date, but minimize
the chance of a corrupted package remaining undetected on my systems.  The only weak security link
I can see is if someone were to trojan my debsums perl script.  If I were more security conscious
I could periodically boot a rescue floppy on each of my hosts and manually verify the md5sum of
the debsums script.  Any comments or suggestions are welcome.





Reply to: