[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: How to partial mirror with pools?



On Tue 03 Apr, James Troup wrote:
> Wookey <wookey@aleph1.co.uk> writes:
> 
> > It creates a specific rsync include filelist to get the updates you
> > need. It does this by looking through the packages.gz/sources.gz
> > files to generate a list of all the ones with /pool in their
> > path. Well, for current potato there aren't any of those - only the
> > link's (conventional) position is recorded.
> 
> Yes.  For the purposes of backwards compatibility, principle of least
> surprise etc. yada yada, potato's Packages files have their Filename:
> entries munged so that they only reference the symlinks in dists/ and
> never pool/ directly.
> 
> The script was probably intended to work with !stable distributions
> where the Filename: references are unmunged.

aha - grokked. Well, I've done a new version, the important bits of which are
below such that it will work with potato instead of newer versions - (It
should of course be made to work with both, but I haven't done that yet).
However whilst this nicely generates a list of files to download the rsync
bit at the end fails completely.

I've spent hours and hours working through the options here and I con't for
the life of me get rsync to work the way it says it should. So far as I can
tell, if the list of rsync include/excludes (whether redirected via a file or
not) contains --exclude '*' on the end then rsync always find no matching
files and downloads nothing.

I have successfully used exclude lists before with + include options within
it so I can't understand for the life of me what's going wrong. I tried one
file after a --include option, I tried leading **/ leading ./ nothing in
front of the file. Putting everything in an exclude-from file with '+ '
prepended. In every case rsync just downloaded one file './'. I note that in
it's debug rsync always reports 'add_exclude(<line from include file>)' which
is a bit confusing - you can't tell which are excludes and which are includes
from this log.

Anyway, having come back to this after a few days to regain my sanity
ftp.uk.debian.org is dead so I tried ftp.de.debian.org and suddenly
everything works. The only obvious difference is that ftp.uk.debian.org is
running a version that reports 'remote version=24', whilst my version and
ftp.de.debian.org report 'local_version=21' and 'remote_version=21'
respectively. Might this matter? rsync --version gives v2.3.2

So the problem was that doing this;
rsync -alPz -vvv --timeout=600 --include-from=.filelist --exclude '*'
ftp://ftp.uk.debian.org::debian/pool/ /mirror/debian/pool/

with a '.filelist' like this:
./contrib/l/lookup/lookup_1.08b-1.diff.gz
./contrib/l/lookup/lookup_1.08b-1.dsc
./contrib/l/lookup/lookup_1.08b.orig.tar.gz
./contrib/m/mancala/mancala_1.0.0-1.diff.gz
./contrib/m/mancala/mancala_1.0.0-1.dsc
./contrib/m/mancala/mancala_1.0.0.orig.tar.gz
./contrib/p/pgp4pine/pgp4pine_1.71b-5.diff.gz
./contrib/p/pgp4pine/pgp4pine_1.71b-5.dsc
./contrib/p/pgp4pine/pgp4pine_1.71b.orig.tar.gz
...

doesn't work with uk.debian.org. I can't check right now as it seems
to be down. Maybe there is a better place to report this - but I also wonder
if I'm just going quiety mad. 

Has anyone else seen this? Is there an rsync list I perhaps ought to report
it to?

I've also been in discussion with Otto Wyss about his perl mirror script -
making it work on the potato vintage version of perl. It's not quite there
yet, but that should produce a significantly faster and more comprehensible
partial mirror tool. More when we have it working.

-----------
Anyway - here's my version of Goswin's script which works for potato (it set
up for uk.debian so the non-US module is called 'debian', so to use it on
other servers (like de.debian) you need to change this to 'debian-non-US'
throughout. Hope this is useful.

#!/bin/sh -e
# Anon rsync partial mirror of Debian with package pool support.
# Copyright 1999, 2000 by Joey Hess <joeyh@debian.org>, GPL'd.
# Add ons by Goswin Brederlow <goswin.brederlow@student.uni-tuebingen.de>
# This Potato version by wookey <wookey@debian.org>

# update potato/woody files and Packages.gz or use old once? If you
# already have the new enough ones say yes. This is for cases when you
# restart a scan after the modem died.
# No is the safe answere here, but wastes bandwith when resumeing.
HAVE_PACKAGE_FILES=yes

# Should a contents file kept updated? Saying NO won't delete old
# Contents files, so when resuming you might want to say no here
# temporarily.
CONTENTS=no

# Flags to pass to rsync. More can be specified on the command line.
# These flags are always passed to rsync:
FLAGS="$@ -rHlptz --partial -vv --timeout=600 --progress"
# These flags are not passed in when we are getting files from pools.
# In particular, --delete is a horrid idea at that point, but good here.
FLAGS_NOPOOL="$FLAGS --exclude Packages --delete"
# And these flags are passed in only when we are getting files from pools.
# Remember, do _not_ include --delete.
FLAGS_POOL="$FLAGS"
# The host to connect to. Currently must carry both non-us and main
# and support anon rsync, which limits the options somewhat.
HOST=ftp.uk.debian.org
# Where to put the mirror (absolute path, please):
DEST=/cdimages/mirror/debian
# The distribution to mirror:
#DISTS="sid potato woody"
DISTS="potato"
# Architecture to mirror:
ARCHS="arm"
# Should source be mirrored too?
SOURCE=yes
# The sections to mirror (main, non-free, etc):
SECTIONS="main contrib non-free"
# Should symlinks be generated to every deb, in an "all" directory?
# I find this is very handy to ease looking up deb filenames.
SYMLINK_FARM=yes

###############################################################################

echo "Make sure we have dists and pool directories" 
mkdir -p $DEST/dists $DEST/pool

# Snarf the contents file.
 
if [ "$CONTENTS" = yes ]; then
        echo "Get contents files."
        for DIST in ${DISTS}; do
            for ARCH in ${ARCHS}; do
                echo Syncing  $DEST/dists/${DIST}/Contents-${ARCH}.gz
                rsync $FLAGS_NOPOOL \
                        $HOST::debian/dists/$DIST/Contents-${ARCH}.gz \
                        $DEST/dists/${DIST}/
                echo Syncing  $DEST/non-US/dists/${DIST}/non-US/Contents-${ARCH}.gz
                rsync $FLAGS_NOPOOL \
                        $HOST::debian/dists/$DIST/non-US/Contents-${ARCH}.gz \
                        $DEST/non-US/dists/${DIST}/non-US/
            done
        done
fi

# Generate list of archs to download
ARCHLIST="binary-all"
DISKS_ARCHLIST=""
NONUS_ARCHLIST="binary-all"

for ARCH in ${ARCHS}; do
    ARCHLIST="${ARCHLIST} binary-${ARCH}"
    DISKS_ARCHLIST="${DISKS_ARCHLIST} disks-${ARCH}"
    NONUS_ARCHLIST="${NONUS_ARCHLIST} binary-${ARCH}"
done

if [ "$SOURCE" = yes ]; then
        ARCHLIST="${ARCHLIST} source"
        NONUS_ARCHLIST="${NONUS_ARCHLIST} source"
fi

# Download packages files (and .debs and sources too, until we move fully
# to pools).

if [ x$HAVE_PACKAGE_FILES != xyes ]; then
echo "Download packages files."
for DIST in ${DISTS}; do
    for section in $SECTIONS; do
        for type in ${ARCHLIST}; do
            echo Syncing  $DEST/dists/$DIST/$section/$type
            mkdir -p $DEST/dists/$DIST/$section/$type
            rsync $FLAGS_NOPOOL \
                $HOST::debian/dists/$DIST/$section/$type \
                $DEST/dists/$DIST/$section/
        done
        if [ $section = "main" ]; then
          if [ $DIST != "sid" ]; then
            for type in ${DISKS_ARCHLIST}; do
                echo Syncing  $DEST/dists/$DIST/$section/$type
                mkdir -p $DEST/dists/$DIST/$section/$type
                rsync $FLAGS_NOPOOL \
                    $HOST::debian/dists/$DIST/$section/$type \
                    $DEST/dists/$DIST/$section/
            done
          fi
        fi
    done
done

for DIST in ${DISTS}; do
    for section in $SECTIONS; do
        for type in ${NONUS_ARCHLIST}; do
            echo Syncing  $DEST/non-US/dists/$DIST/non-US/$section/$type
            mkdir -p $DEST/non-US/dists/$DIST/non-US/$section/$type
            rsync $FLAGS_NOPOOL \
                $HOST::debian/dists/$DIST/non-US/$section/$type \
                $DEST/non-US/dists/$DIST/non-US/$section/
        done
    done
done
fi

# Update the package pool.
# Note that the same pool is used for non-us as everything else.
# TODO: probably needs to be optimized, we'll see as time goes by..
cd $DEST/pool || exit 1
rm -f .filelist
touch .filelist

#Check for broken links between dist and pools (for potato)
echo "Generate list of missing pool links"
for DIST in ${DISTS}; do
    for section in ${SECTIONS}; do
	for type in ${ARCHLIST}; do
#            echo Checking  $DEST/dists/$DIST/$section/$type
            for area in `ls $DEST/dists/$DIST/$section/$type`; do
		for file in `ls $DEST/dists/$DIST/$section/$type/$area`; do
	    	    if [ -L $DEST/dists/$DIST/$section/$type/$area/$file ]; then
		      if ! [ -e $DEST/dists/$DIST/$section/$type/$area/$file ]; then
		        for wantfile in `ls -l $DEST/dists/$DIST/$section/$type/$area/$file \
			| tr --squeeze-repeats ' ' | cut -d ' ' -f 11`; do
			    DIRS="`dirname $wantfile` $DIRS"
		    	    echo $wantfile >> .filelist
			done
		      fi
		    fi
		done
	    done  
        done
    done
done
# Get a list of all the files that are in the pool based on the Packages
# files that were already updated. Thanks to aj for the awk-fu.
# FIXME: this is needed for non-potato dists
#for file in `find $DEST -name Packages.gz | \
#                xargs -r zgrep -i "^Filename:" | \
#                cut -d ' ' -f 2 | grep ^pool/` \
#            `find $DEST -name Sources.gz | xargs -r zcat | \
#                    awk '/^Directory:/ {D=$2} /Files:/,/^$/ { \
#                        if ($1 != "Files:" && $0 != "") print D "/" $3; \
#                }' | grep ^pool/`
#do
#        DIRS="`dirname $file` $DIRS"
#        echo $file >> .filelist
#done


# Remove leading "pool" from all files in the file list.
# The "./" we change it to is there so the file names
# exactly match in the delete step and the files that get downloaded
# are not deleted.
sed 's!^../../../../../pool/!./!' .filelist | sort -u > .filelist.new
mv -f .filelist.new .filelist
#dirs all have preeceeding ../../../../../ -need to strip these 
#(all on one line so trailing global replace needed)
echo $DIRS > .dirlist
sed 's!../../../../../!!g' .dirlist | sort -u > .dirlist.new
mv -f .dirlist.new .dirlist
DIRS=`cat .dirlist`

# Use the last version as template for the new file
# Hopefully files don't change that much from release to release
#FIXME: this is needed for non-potato dists
#cat .filelist | while read file; do
#  if [ ! -e $file ]; then
#    dir=`dirname $file`
#    name=`basename $file`
#    case $file in
#      *.deb)
#         arch=`basename $name .deb | cut -d"_" -f3`
#         old=`ls $dir/\`echo $name | cut -d"_" -f1\`_*_$arch.deb 2>/dev/null | tail --lines 1`
#         if [ "x$old" = x ]; then
#           old=`ls ../dists/*/*/binary-$arch/*/\`echo $name | cut -d"_" -f1\`_*.deb 2>/dev/null | tail --lines 1`
#           if [ "x$old" = x ]; then
#             old=`ls ../non-US/dists/*/non-US/*/binary-$arch/*/\`echo $name | cut -d"_" -f1\`_*.deb 2>/dev/null | tail --lines 1`
#             if [ "x$old" = x ]; then
#               old=""
#             fi
#           fi
#         fi
#         ;;
#      *.dsc)
#         old=`ls $dir/\`echo $name | cut -d"_" -f1\`_*.dsc 2>/dev/null | tail --lines 1`
#         if [ "x$old" = x ]; then
#           old=`ls ../dists/*/*/source/*/\`echo $name | cut -d"_" -f1\`_*.dsc 2>/dev/null | tail --lines 1`
#           if [ "x$old" = x ]; then
#             old=`ls ../non-US/dists/*/non-US/*/source/*/\`echo $name | cut -d"_" -f1\`_*.dsc 2>/dev/null | tail --lines 1`
#             if [ "x$old" = x ]; then
#               old=""
#             fi
#           fi
#         fi
#         ;;
#      *.diff.gz)
#         old=`ls $dir/\`echo $name | cut -d"_" -f1\`_*.diff.gz 2>/dev/null | tail --lines 1`
#         if [ "x$old" = x ]; then
#           old=`ls ../dists/*/*/source/*/\`echo $name | cut -d"_" -f1\`_*.diff.gz 2>/dev/null | tail --lines 1`
#           if [ "x$old" = x ]; then
#             old=`ls ../non-US/dists/*/non-US/*/source/*/\`echo $name | cut -d"_" -f1\`_*.diff.gz 2>/dev/null | tail --lines 1`
#             if [ "x$old" = x ]; then
#               old=""
#             fi
#           fi
#         fi
#         ;;
#      *.orig.tar.gz)
#         old=`ls $dir/\`echo $name | cut -d"_" -f1\`_*.orig.tar.gz 2>/dev/null | tail --lines 1`
#         if [ "x$old" = x ]; then
#           old=`ls ../dists/*/*/source/*/\`echo $name | cut -d"_" -f1\`_*.orig.tar.gz 2>/dev/null | tail --lines 1`
#           if [ "x$old" = x ]; then
#             old=`ls ../non-US/dists/*/non-US/*/source/*/\`echo $name | cut -d"_" -f1\`_*.orig.tar.gz 2>/dev/null | tail --lines 1`
#             if [ "x$old" = x ]; then
#               old=""
#             fi
#           fi
#         fi
#         ;;
#      *.tar.gz)
#         old=`ls $dir/\`echo $name | cut -d"_" -f1\`_*.tar.gz 2>/dev/null | tail --lines 1`
#         if [ "x$old" = x ]; then
#           old=`ls ../dists/*/*/source/*/\`echo $name | cut -d"_" -f1\`_*.tar.gz 2>/dev/null | tail --lines 1`
#           if [ "x$old" = x ]; then
#             old=`ls ../non-US/dists/*/non-US/*/source/*/\`echo $name | cut -d"_" -f1\`_*.tar.gz 2>/dev/null | tail --lines 1`
#             if [ "x$old" = x ]; then
#               old=""
#             fi
#           fi
#         fi
#         ;;
#      *)
#         old=""
#         ;;
#    esac
#    if [ "x$old" != "x" ]; then
#      if [ -e $old ]; then
#        echo found old file $old for $file
#          cp $old $file
#      else
#        echo cant find old file $old for $file
#      fi
#    else
#      echo cant find old file for $file
#    fi
#  fi
#done


(cd .. & mkdir -p $DIRS)
# Tell rsync to download only the files in the list. The exclude is here 
# to make the recursion not get anything else.
# TODO: main pool needs to be donwloaded from too, once there is one.
echo Syncing  non-US/pool
rsync $FLAGS_POOL \
        $HOST::debian/non-US/pool/ --include-from .filelist \
        --exclude '*' $DEST/non-US/pool/
echo Syncing  pool      
rsync $FLAGS_POOL \
        $HOST::debian/pool/ --include-from .filelist --exclude '*' $DEST/pool/
# Delete all files that are not in the list, then any empty directories.
# This also kills the filelist.
find -type f | fgrep -vxf .filelist | xargs -r rm -f
find -type d -empty | xargs -r rmdir -p --ignore-fail-on-non-empty
# End of package pool update.

# Update symlinks (I like to have a link to every .deb in one directory).
if [ "$SYMLINK_FARM" = yes ]; then
        install -d  $DEST/all
        cd $DEST/all || exit 1
        find -name \*.deb | xargs -r rm -f
        find .. -name "*.deb" -type f | grep -v ^../all | \
                xargs -r -i ln -sf {} .
fi

# Waste bandwidth. Put a partial mirror on your laptop today!



Wookey
-- 
Aleph One Ltd, Bottisham, CAMBRIDGE, CB5 9BA, UK  Tel (00 44) 1223 811679
work: http://www.aleph1.co.uk/     play: http://www.chaos.org.uk/~wookey/



Reply to: