[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

rsync'ing pools (Was: Re: DEBIAN IS LOOSING PACKAGES AND NOBODY CARES!!!)



>>>>> " " == Tinguaro Barreno Delgado <tbarreno@debian.org> writes:

     > Hello again.

     > On Sun, Dec 31, 2000 at 02:22:45PM +0000, Miquel van
     > Smoorenburg wrote:
    >>  Yes. The structure of the archive has changed because of
    >> 'package pools'.  You need to mirror 'pool' as well.
    >> 
    >> Also, "woody" is no longer "unstable". "sid" is. "woody" is
    >> "testing".
    >> 
    >> Mike.
    >> 

     > Ok. Thanks to Peter Palfrader too. Then, there is a more
     > complicated issue for those who has a partial mirror (only i386
     > for me), but I think that is possible with rsync options.

There was a script posted here to do partial rsync mirrors.

I used that script and added several features to it. Whats missing is
support for the debian-installed in sid, but I'm working on that.

Changes:
- multiple architectures
- keep links from woody -> potato
- mirror binary-all
- mirror US and non-US pools
- use last version as template for new files
- mirror disks

People intrested in only one arch and only woody/sid should remove
binary-all and should resolve links.

Joey, can you put that where it originally came from? or next to the
original script? Any changes to the script from your side?

So heres the script for all who care:

----------------------------------------------------------------------
#!/bin/sh -e
# Anon rsync partial mirror of Debian with package pool support.
# Copyright 1999, 2000 by Joey Hess <joeyh@debian.org>, GPL'd.
# Add ons by Goswin Brederlow <goswin.brederlow@student.uni-tuebingen.de>

# update potato/woody files and Packages.gz or use old once? If you
# already have the new enough once say yes. This is for cases when you
# restart a scan after the modem died.
# No is the save answere here, but wastes bandwith when resumeing.
HAVE_PACKAGE_FILES=no

# Should a contents file kept updated? Saying NO won't delete old
# Contents files, so when resuming you might want to say no here
# temporarily.
CONTENTS=yes

# Flags to pass to rsync. More can be specified on the command line.
# These flags are always passed to rsync:
FLAGS="$@ -rlpt --partial -v --progress"
# These flags are not passed in when we are getting files from pools.
# In particular, --delete is a horrid idea at that point, but good here.
FLAGS_NOPOOL="$FLAGS --exclude Packages --delete"
# And these flags are passed in only when we are getting files from pools.
# Remember, do _not_ include --delete.
FLAGS_POOL="$FLAGS"
# The host to connect to. Currently must carry both non-us and main
# and support anon rsync, which limits the options somewhat.
HOST=ftp.de.debian.org
# Where to put the mirror (absolute path, please):
DEST=/mnt/raid/rsync-mirror/debian
# The distribution to mirror:
DISTS="sid potato woody"
# Architecture to mirror:
ARCHS="i386 alpha m68k"
# Should source be mirrored too?
SOURCE=yes
# The sections to mirror (main, non-free, etc):
SECTIONS="main contrib non-free"
# Should symlinks be generated to every deb, in an "all" directory?
# I find this is very handy to ease looking up deb filenames.
SYMLINK_FARM=no

###############################################################################

mkdir -p $DEST/dists $DEST/pool

# Snarf the contents file.
if [ "$CONTENTS" = yes ]; then
	for DIST in ${DISTS}; do
	    for ARCH in ${ARCHS}; do
		echo Syncing  $DEST/dists/${DIST}/Contents-${ARCH}.gz
		rsync $FLAGS_NOPOOL \
			$HOST::debian/dists/$DIST/Contents-${ARCH}.gz \
			$DEST/dists/${DIST}/
		echo Syncing  $DEST/non-US/dists/${DIST}/non-US/Contents-${ARCH}.gz
		rsync $FLAGS_NOPOOL \
			$HOST::debian-non-US/dists/$DIST/non-US/Contents-${ARCH}.gz \
			$DEST/non-US/dists/${DIST}/non-US/
	    done
	done
fi

# Generate list of archs to download
ARCHLIST="binary-all"
DISKS_ARCHLIST=""
NONUS_ARCHLIST="binary-all"

for ARCH in ${ARCHS}; do
    ARCHLIST="${ARCHLIST} binary-${ARCH}"
    DISKS_ARCHLIST="${DISKS_ARCHLIST} disks-${ARCH}"
    NONUS_ARCHLIST="${NONUS_ARCHLIST} binary-${ARCH}"
done

if [ "$SOURCE" = yes ]; then
        ARCHLIST="${ARCHLIST} source"
        NONUS_ARCHLIST="${NONUS_ARCHLIST} source"
fi

# Download packages files (and .debs and sources too, until we move fully
# to pools).

if [ x$HAVE_PACKAGE_FILES != xyes ]; then
for DIST in ${DISTS}; do
    for section in $SECTIONS; do
	for type in ${ARCHLIST}; do
	    echo Syncing  $DEST/dists/$DIST/$section/$type
            mkdir -p $DEST/dists/$DIST/$section/$type
            rsync $FLAGS_NOPOOL \
		$HOST::debian/dists/$DIST/$section/$type \
                $DEST/dists/$DIST/$section/
        done
	if [ $section = "main" ]; then
          if [ $DIST != "sid" ]; then
	    for type in ${DISKS_ARCHLIST}; do
		echo Syncing  $DEST/dists/$DIST/$section/$type
		mkdir -p $DEST/dists/$DIST/$section/$type
		rsync $FLAGS_NOPOOL \
		    $HOST::debian/dists/$DIST/$section/$type \
		    $DEST/dists/$DIST/$section/
	    done
          fi
	fi
    done
done

for DIST in ${DISTS}; do
    for section in $SECTIONS; do
	for type in ${NONUS_ARCHLIST}; do
	    echo Syncing  $DEST/non-US/dists/$DIST/non-US/$section/$type
            mkdir -p $DEST/non-US/dists/$DIST/non-US/$section/$type
            rsync $FLAGS_NOPOOL \
		$HOST::debian-non-US/dists/$DIST/non-US/$section/$type \
                $DEST/non-US/dists/$DIST/non-US/$section/
        done
    done
done
fi

# Update the package pool.
# Note that the same pool is used for non-us as everything else.
# TODO: probably needs to be optimized, we'll see as time goes by..
cd $DEST/pool || exit 1
rm -f .filelist
touch .filelist

# Get a list of all the files that are in the pool based on the Packages
# files that were already updated. Thanks to aj for the awk-fu.
for file in `find $DEST -name Packages.gz | \
                xargs -r zgrep -i "^Filename:" | \
		cut -d ' ' -f 2 | grep ^pool/` \
            `find $DEST -name Sources.gz | xargs -r zcat | \
                    awk '/^Directory:/ {D=$2} /Files:/,/^$/ { \
                        if ($1 != "Files:" && $0 != "") print D "/" $3; \
                }' | grep ^pool/`
do
        DIRS="`dirname $file` $DIRS"
        echo $file >> .filelist
done

# Remove leading "pool" from all files in the file list.
# The "./" we change it to is there so the file names
# exactly match in the delete step and the files that get downloaded
# are not deleted.
sed 's!^pool/!./!' .filelist | sort -u > .filelist.new
mv -f .filelist.new .filelist

# Use the last version as template for the new file
# Hopefully files don't change that much from release to release
cat .filelist | while read file; do
  if [ ! -e $file ]; then
    dir=`dirname $file`
    name=`basename $file`
    case $file in
      *.deb)
	 arch=`basename $name .deb | cut -d"_" -f3`
	 old=`ls $dir/\`echo $name | cut -d"_" -f1\`_*_$arch.deb 2>/dev/null | tail --lines 1`
	 if [ "x$old" = x ]; then
	   old=`ls ../dists/*/*/binary-$arch/*/\`echo $name | cut -d"_" -f1\`_*.deb 2>/dev/null | tail --lines 1`
	   if [ "x$old" = x ]; then
	     old=`ls ../non-US/dists/*/non-US/*/binary-$arch/*/\`echo $name | cut -d"_" -f1\`_*.deb 2>/dev/null | tail --lines 1`
	     if [ "x$old" = x ]; then
	       old=""
	     fi
	   fi
	 fi
	 ;;
      *.dsc)
	 old=`ls $dir/\`echo $name | cut -d"_" -f1\`_*.dsc 2>/dev/null | tail --lines 1`
	 if [ "x$old" = x ]; then
	   old=`ls ../dists/*/*/source/*/\`echo $name | cut -d"_" -f1\`_*.dsc 2>/dev/null | tail --lines 1`
	   if [ "x$old" = x ]; then
	     old=`ls ../non-US/dists/*/non-US/*/source/*/\`echo $name | cut -d"_" -f1\`_*.dsc 2>/dev/null | tail --lines 1`
	     if [ "x$old" = x ]; then
	       old=""
	     fi
	   fi
	 fi
	 ;;
      *.diff.gz)
	 old=`ls $dir/\`echo $name | cut -d"_" -f1\`_*.diff.gz 2>/dev/null | tail --lines 1`
	 if [ "x$old" = x ]; then
	   old=`ls ../dists/*/*/source/*/\`echo $name | cut -d"_" -f1\`_*.diff.gz 2>/dev/null | tail --lines 1`
	   if [ "x$old" = x ]; then
	     old=`ls ../non-US/dists/*/non-US/*/source/*/\`echo $name | cut -d"_" -f1\`_*.diff.gz 2>/dev/null | tail --lines 1`
	     if [ "x$old" = x ]; then
	       old=""
	     fi
	   fi
	 fi
	 ;;
      *.orig.tar.gz)
	 old=`ls $dir/\`echo $name | cut -d"_" -f1\`_*.orig.tar.gz 2>/dev/null | tail --lines 1`
	 if [ "x$old" = x ]; then
	   old=`ls ../dists/*/*/source/*/\`echo $name | cut -d"_" -f1\`_*.orig.tar.gz 2>/dev/null | tail --lines 1`
	   if [ "x$old" = x ]; then
	     old=`ls ../non-US/dists/*/non-US/*/source/*/\`echo $name | cut -d"_" -f1\`_*.orig.tar.gz 2>/dev/null | tail --lines 1`
	     if [ "x$old" = x ]; then
	       old=""
	     fi
	   fi
	 fi
	 ;;
      *.tar.gz)
	 old=`ls $dir/\`echo $name | cut -d"_" -f1\`_*.tar.gz 2>/dev/null | tail --lines 1`
	 if [ "x$old" = x ]; then
	   old=`ls ../dists/*/*/source/*/\`echo $name | cut -d"_" -f1\`_*.tar.gz 2>/dev/null | tail --lines 1`
	   if [ "x$old" = x ]; then
	     old=`ls ../non-US/dists/*/non-US/*/source/*/\`echo $name | cut -d"_" -f1\`_*.tar.gz 2>/dev/null | tail --lines 1`
	     if [ "x$old" = x ]; then
	       old=""
	     fi
	   fi
	 fi
	 ;;
      *)
	 old=""
	 ;;
    esac
    if [ "x$old" != "x" ]; then
      if [ -e $old ]; then
	echo found old file $old for $file
	  cp $old $file
      else
	echo cant find old file $old for $file
      fi
    else
      echo cant find old file for $file
    fi
  fi
done

(cd .. && mkdir -p $DIRS)
# Tell rsync to download only the files in the list. The exclude is here 
# to make the recursion not get anything else.
# TODO: main pool needs to be donwloaded from too, once there is one.
echo Syncing  non-US/pool
rsync $FLAGS_POOL \
        $HOST::debian-non-US/pool/ --include-from .filelist \
	--exclude '*' $DEST/pool/
echo Syncing  pool	
rsync $FLAGS_POOL \
        $HOST::debian/pool/ --include-from .filelist --exclude '*' $DEST/pool/
# Delete all files that are not in the list, then any empty directories.
# This also kills the filelist.
find -type f | fgrep -vxf .filelist | xargs -r rm -f
find -type d -empty | xargs -r rmdir -p --ignore-fail-on-non-empty
# End of package pool update.

# Update symlinks (I like to have a link to every .deb in one directory).
if [ "$SYMLINK_FARM" = yes ]; then
        install -d  $DEST/all
        cd $DEST/all || exit 1
        find -name \*.deb | xargs -r rm -f
        find .. -name "*.deb" -type f | grep -v ^../all | \
                xargs -r -i ln -sf {} .
fi

# Waste bandwidth. Put a partial mirror on your laptop today!



Reply to: