Re: How to partial mirror with pools?
On Tue 03 Apr, James Troup wrote:
> Wookey <wookey@aleph1.co.uk> writes:
>
> > It creates a specific rsync include filelist to get the updates you
> > need. It does this by looking through the packages.gz/sources.gz
> > files to generate a list of all the ones with /pool in their
> > path. Well, for current potato there aren't any of those - only the
> > link's (conventional) position is recorded.
>
> Yes. For the purposes of backwards compatibility, principle of least
> surprise etc. yada yada, potato's Packages files have their Filename:
> entries munged so that they only reference the symlinks in dists/ and
> never pool/ directly.
>
> The script was probably intended to work with !stable distributions
> where the Filename: references are unmunged.
aha - grokked. Well, I've done a new version, the important bits of which are
below such that it will work with potato instead of newer versions - (It
should of course be made to work with both, but I haven't done that yet).
However whilst this nicely generates a list of files to download the rsync
bit at the end fails completely.
I've spent hours and hours working through the options here and I con't for
the life of me get rsync to work the way it says it should. So far as I can
tell, if the list of rsync include/excludes (whether redirected via a file or
not) contains --exclude '*' on the end then rsync always find no matching
files and downloads nothing.
I have successfully used exclude lists before with + include options within
it so I can't understand for the life of me what's going wrong. I tried one
file after a --include option, I tried leading **/ leading ./ nothing in
front of the file. Putting everything in an exclude-from file with '+ '
prepended. In every case rsync just downloaded one file './'. I note that in
it's debug rsync always reports 'add_exclude(<line from include file>)' which
is a bit confusing - you can't tell which are excludes and which are includes
from this log.
Anyway, having come back to this after a few days to regain my sanity
ftp.uk.debian.org is dead so I tried ftp.de.debian.org and suddenly
everything works. The only obvious difference is that ftp.uk.debian.org is
running a version that reports 'remote version=24', whilst my version and
ftp.de.debian.org report 'local_version=21' and 'remote_version=21'
respectively. Might this matter? rsync --version gives v2.3.2
So the problem was that doing this;
rsync -alPz -vvv --timeout=600 --include-from=.filelist --exclude '*'
ftp://ftp.uk.debian.org::debian/pool/ /mirror/debian/pool/
with a '.filelist' like this:
./contrib/l/lookup/lookup_1.08b-1.diff.gz
./contrib/l/lookup/lookup_1.08b-1.dsc
./contrib/l/lookup/lookup_1.08b.orig.tar.gz
./contrib/m/mancala/mancala_1.0.0-1.diff.gz
./contrib/m/mancala/mancala_1.0.0-1.dsc
./contrib/m/mancala/mancala_1.0.0.orig.tar.gz
./contrib/p/pgp4pine/pgp4pine_1.71b-5.diff.gz
./contrib/p/pgp4pine/pgp4pine_1.71b-5.dsc
./contrib/p/pgp4pine/pgp4pine_1.71b.orig.tar.gz
...
doesn't work with uk.debian.org. I can't check right now as it seems
to be down. Maybe there is a better place to report this - but I also wonder
if I'm just going quiety mad.
Has anyone else seen this? Is there an rsync list I perhaps ought to report
it to?
I've also been in discussion with Otto Wyss about his perl mirror script -
making it work on the potato vintage version of perl. It's not quite there
yet, but that should produce a significantly faster and more comprehensible
partial mirror tool. More when we have it working.
-----------
Anyway - here's my version of Goswin's script which works for potato (it set
up for uk.debian so the non-US module is called 'debian', so to use it on
other servers (like de.debian) you need to change this to 'debian-non-US'
throughout. Hope this is useful.
#!/bin/sh -e
# Anon rsync partial mirror of Debian with package pool support.
# Copyright 1999, 2000 by Joey Hess <joeyh@debian.org>, GPL'd.
# Add ons by Goswin Brederlow <goswin.brederlow@student.uni-tuebingen.de>
# This Potato version by wookey <wookey@debian.org>
# update potato/woody files and Packages.gz or use old once? If you
# already have the new enough ones say yes. This is for cases when you
# restart a scan after the modem died.
# No is the safe answere here, but wastes bandwith when resumeing.
HAVE_PACKAGE_FILES=yes
# Should a contents file kept updated? Saying NO won't delete old
# Contents files, so when resuming you might want to say no here
# temporarily.
CONTENTS=no
# Flags to pass to rsync. More can be specified on the command line.
# These flags are always passed to rsync:
FLAGS="$@ -rHlptz --partial -vv --timeout=600 --progress"
# These flags are not passed in when we are getting files from pools.
# In particular, --delete is a horrid idea at that point, but good here.
FLAGS_NOPOOL="$FLAGS --exclude Packages --delete"
# And these flags are passed in only when we are getting files from pools.
# Remember, do _not_ include --delete.
FLAGS_POOL="$FLAGS"
# The host to connect to. Currently must carry both non-us and main
# and support anon rsync, which limits the options somewhat.
HOST=ftp.uk.debian.org
# Where to put the mirror (absolute path, please):
DEST=/cdimages/mirror/debian
# The distribution to mirror:
#DISTS="sid potato woody"
DISTS="potato"
# Architecture to mirror:
ARCHS="arm"
# Should source be mirrored too?
SOURCE=yes
# The sections to mirror (main, non-free, etc):
SECTIONS="main contrib non-free"
# Should symlinks be generated to every deb, in an "all" directory?
# I find this is very handy to ease looking up deb filenames.
SYMLINK_FARM=yes
###############################################################################
echo "Make sure we have dists and pool directories"
mkdir -p $DEST/dists $DEST/pool
# Snarf the contents file.
if [ "$CONTENTS" = yes ]; then
echo "Get contents files."
for DIST in ${DISTS}; do
for ARCH in ${ARCHS}; do
echo Syncing $DEST/dists/${DIST}/Contents-${ARCH}.gz
rsync $FLAGS_NOPOOL \
$HOST::debian/dists/$DIST/Contents-${ARCH}.gz \
$DEST/dists/${DIST}/
echo Syncing $DEST/non-US/dists/${DIST}/non-US/Contents-${ARCH}.gz
rsync $FLAGS_NOPOOL \
$HOST::debian/dists/$DIST/non-US/Contents-${ARCH}.gz \
$DEST/non-US/dists/${DIST}/non-US/
done
done
fi
# Generate list of archs to download
ARCHLIST="binary-all"
DISKS_ARCHLIST=""
NONUS_ARCHLIST="binary-all"
for ARCH in ${ARCHS}; do
ARCHLIST="${ARCHLIST} binary-${ARCH}"
DISKS_ARCHLIST="${DISKS_ARCHLIST} disks-${ARCH}"
NONUS_ARCHLIST="${NONUS_ARCHLIST} binary-${ARCH}"
done
if [ "$SOURCE" = yes ]; then
ARCHLIST="${ARCHLIST} source"
NONUS_ARCHLIST="${NONUS_ARCHLIST} source"
fi
# Download packages files (and .debs and sources too, until we move fully
# to pools).
if [ x$HAVE_PACKAGE_FILES != xyes ]; then
echo "Download packages files."
for DIST in ${DISTS}; do
for section in $SECTIONS; do
for type in ${ARCHLIST}; do
echo Syncing $DEST/dists/$DIST/$section/$type
mkdir -p $DEST/dists/$DIST/$section/$type
rsync $FLAGS_NOPOOL \
$HOST::debian/dists/$DIST/$section/$type \
$DEST/dists/$DIST/$section/
done
if [ $section = "main" ]; then
if [ $DIST != "sid" ]; then
for type in ${DISKS_ARCHLIST}; do
echo Syncing $DEST/dists/$DIST/$section/$type
mkdir -p $DEST/dists/$DIST/$section/$type
rsync $FLAGS_NOPOOL \
$HOST::debian/dists/$DIST/$section/$type \
$DEST/dists/$DIST/$section/
done
fi
fi
done
done
for DIST in ${DISTS}; do
for section in $SECTIONS; do
for type in ${NONUS_ARCHLIST}; do
echo Syncing $DEST/non-US/dists/$DIST/non-US/$section/$type
mkdir -p $DEST/non-US/dists/$DIST/non-US/$section/$type
rsync $FLAGS_NOPOOL \
$HOST::debian/dists/$DIST/non-US/$section/$type \
$DEST/non-US/dists/$DIST/non-US/$section/
done
done
done
fi
# Update the package pool.
# Note that the same pool is used for non-us as everything else.
# TODO: probably needs to be optimized, we'll see as time goes by..
cd $DEST/pool || exit 1
rm -f .filelist
touch .filelist
#Check for broken links between dist and pools (for potato)
echo "Generate list of missing pool links"
for DIST in ${DISTS}; do
for section in ${SECTIONS}; do
for type in ${ARCHLIST}; do
# echo Checking $DEST/dists/$DIST/$section/$type
for area in `ls $DEST/dists/$DIST/$section/$type`; do
for file in `ls $DEST/dists/$DIST/$section/$type/$area`; do
if [ -L $DEST/dists/$DIST/$section/$type/$area/$file ]; then
if ! [ -e $DEST/dists/$DIST/$section/$type/$area/$file ]; then
for wantfile in `ls -l $DEST/dists/$DIST/$section/$type/$area/$file \
| tr --squeeze-repeats ' ' | cut -d ' ' -f 11`; do
DIRS="`dirname $wantfile` $DIRS"
echo $wantfile >> .filelist
done
fi
fi
done
done
done
done
done
# Get a list of all the files that are in the pool based on the Packages
# files that were already updated. Thanks to aj for the awk-fu.
# FIXME: this is needed for non-potato dists
#for file in `find $DEST -name Packages.gz | \
# xargs -r zgrep -i "^Filename:" | \
# cut -d ' ' -f 2 | grep ^pool/` \
# `find $DEST -name Sources.gz | xargs -r zcat | \
# awk '/^Directory:/ {D=$2} /Files:/,/^$/ { \
# if ($1 != "Files:" && $0 != "") print D "/" $3; \
# }' | grep ^pool/`
#do
# DIRS="`dirname $file` $DIRS"
# echo $file >> .filelist
#done
# Remove leading "pool" from all files in the file list.
# The "./" we change it to is there so the file names
# exactly match in the delete step and the files that get downloaded
# are not deleted.
sed 's!^../../../../../pool/!./!' .filelist | sort -u > .filelist.new
mv -f .filelist.new .filelist
#dirs all have preeceeding ../../../../../ -need to strip these
#(all on one line so trailing global replace needed)
echo $DIRS > .dirlist
sed 's!../../../../../!!g' .dirlist | sort -u > .dirlist.new
mv -f .dirlist.new .dirlist
DIRS=`cat .dirlist`
# Use the last version as template for the new file
# Hopefully files don't change that much from release to release
#FIXME: this is needed for non-potato dists
#cat .filelist | while read file; do
# if [ ! -e $file ]; then
# dir=`dirname $file`
# name=`basename $file`
# case $file in
# *.deb)
# arch=`basename $name .deb | cut -d"_" -f3`
# old=`ls $dir/\`echo $name | cut -d"_" -f1\`_*_$arch.deb 2>/dev/null | tail --lines 1`
# if [ "x$old" = x ]; then
# old=`ls ../dists/*/*/binary-$arch/*/\`echo $name | cut -d"_" -f1\`_*.deb 2>/dev/null | tail --lines 1`
# if [ "x$old" = x ]; then
# old=`ls ../non-US/dists/*/non-US/*/binary-$arch/*/\`echo $name | cut -d"_" -f1\`_*.deb 2>/dev/null | tail --lines 1`
# if [ "x$old" = x ]; then
# old=""
# fi
# fi
# fi
# ;;
# *.dsc)
# old=`ls $dir/\`echo $name | cut -d"_" -f1\`_*.dsc 2>/dev/null | tail --lines 1`
# if [ "x$old" = x ]; then
# old=`ls ../dists/*/*/source/*/\`echo $name | cut -d"_" -f1\`_*.dsc 2>/dev/null | tail --lines 1`
# if [ "x$old" = x ]; then
# old=`ls ../non-US/dists/*/non-US/*/source/*/\`echo $name | cut -d"_" -f1\`_*.dsc 2>/dev/null | tail --lines 1`
# if [ "x$old" = x ]; then
# old=""
# fi
# fi
# fi
# ;;
# *.diff.gz)
# old=`ls $dir/\`echo $name | cut -d"_" -f1\`_*.diff.gz 2>/dev/null | tail --lines 1`
# if [ "x$old" = x ]; then
# old=`ls ../dists/*/*/source/*/\`echo $name | cut -d"_" -f1\`_*.diff.gz 2>/dev/null | tail --lines 1`
# if [ "x$old" = x ]; then
# old=`ls ../non-US/dists/*/non-US/*/source/*/\`echo $name | cut -d"_" -f1\`_*.diff.gz 2>/dev/null | tail --lines 1`
# if [ "x$old" = x ]; then
# old=""
# fi
# fi
# fi
# ;;
# *.orig.tar.gz)
# old=`ls $dir/\`echo $name | cut -d"_" -f1\`_*.orig.tar.gz 2>/dev/null | tail --lines 1`
# if [ "x$old" = x ]; then
# old=`ls ../dists/*/*/source/*/\`echo $name | cut -d"_" -f1\`_*.orig.tar.gz 2>/dev/null | tail --lines 1`
# if [ "x$old" = x ]; then
# old=`ls ../non-US/dists/*/non-US/*/source/*/\`echo $name | cut -d"_" -f1\`_*.orig.tar.gz 2>/dev/null | tail --lines 1`
# if [ "x$old" = x ]; then
# old=""
# fi
# fi
# fi
# ;;
# *.tar.gz)
# old=`ls $dir/\`echo $name | cut -d"_" -f1\`_*.tar.gz 2>/dev/null | tail --lines 1`
# if [ "x$old" = x ]; then
# old=`ls ../dists/*/*/source/*/\`echo $name | cut -d"_" -f1\`_*.tar.gz 2>/dev/null | tail --lines 1`
# if [ "x$old" = x ]; then
# old=`ls ../non-US/dists/*/non-US/*/source/*/\`echo $name | cut -d"_" -f1\`_*.tar.gz 2>/dev/null | tail --lines 1`
# if [ "x$old" = x ]; then
# old=""
# fi
# fi
# fi
# ;;
# *)
# old=""
# ;;
# esac
# if [ "x$old" != "x" ]; then
# if [ -e $old ]; then
# echo found old file $old for $file
# cp $old $file
# else
# echo cant find old file $old for $file
# fi
# else
# echo cant find old file for $file
# fi
# fi
#done
(cd .. & mkdir -p $DIRS)
# Tell rsync to download only the files in the list. The exclude is here
# to make the recursion not get anything else.
# TODO: main pool needs to be donwloaded from too, once there is one.
echo Syncing non-US/pool
rsync $FLAGS_POOL \
$HOST::debian/non-US/pool/ --include-from .filelist \
--exclude '*' $DEST/non-US/pool/
echo Syncing pool
rsync $FLAGS_POOL \
$HOST::debian/pool/ --include-from .filelist --exclude '*' $DEST/pool/
# Delete all files that are not in the list, then any empty directories.
# This also kills the filelist.
find -type f | fgrep -vxf .filelist | xargs -r rm -f
find -type d -empty | xargs -r rmdir -p --ignore-fail-on-non-empty
# End of package pool update.
# Update symlinks (I like to have a link to every .deb in one directory).
if [ "$SYMLINK_FARM" = yes ]; then
install -d $DEST/all
cd $DEST/all || exit 1
find -name \*.deb | xargs -r rm -f
find .. -name "*.deb" -type f | grep -v ^../all | \
xargs -r -i ln -sf {} .
fi
# Waste bandwidth. Put a partial mirror on your laptop today!
Wookey
--
Aleph One Ltd, Bottisham, CAMBRIDGE, CB5 9BA, UK Tel (00 44) 1223 811679
work: http://www.aleph1.co.uk/ play: http://www.chaos.org.uk/~wookey/
Reply to: