[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: jigdo-file: Does not report package rejections because checksum mismatch



Hi,

i wrote:
> > us.cdimage.debian.org is a quick one.

Nicholas Geovanis wrote:
> From which location? Germany?

Ja.


How about this final message in case that files are missing and that
mismatching downloads were detected ?
(The mismatches shown are fake and recorded twice to get more than 2.)

========================================================== begin of example
-----------------------------------------------------------------
Aaargh - 1 files remain missing. This should not happen!
4 download attempts yielded files with mismatching MD5 checksums:

   http://archive.debian.org/debian/pool/main/p/partman-reiserfs/partman-reiserfs_50_all.udeb
   http://archive.debian.org/debian/pool/main/p/partman-reiserfs/partman-reiserfs_50_all.udeb
   http://us.cdimage.debian.org/cdimage/snapshot/Debian/pool/main/p/partman-reiserfs/partman-reiserfs_50_all.udeb
   ... the WARNING messages of this run report more ...

After a retry with the same mirror consider to search the web
for the names of remaining mismatching packages. As mirror name
use the found URL up to the "/" before the directory name "pool".

Press Return to retry downloading the missing files.
Press Ctrl-C to abort. (If you re-run jigdo-lite later, it will
resume from here, the downloaded data is not lost if you press
Ctrl-C now.)
:
========================================================== end of example

Is the advise understandable ?
I propose one repetition to remove any warnings from the user defined mirror
which were resolved by the fallback mirror. In the second run only messages
about the problematic files should emerge, hopefully making it easier to
spot the checksum warnings.

The list of URLs must be restricted to 3 and the usual messages must be
omitted in order to keep the text below 24 lines if the URLs are longer
than two lines.

The usual message appears if no mismatches were detected but still files
are missing.

I changed the old message
  Aaargh - 1 files could not be downloaded. This should not
  happen! ...
in both cases to
  Aaargh - 1 files remain missing. This should not happen!
because it matches better the spectrum of potential problem causes.

============================================================================
--- /usr/bin/jigdo-lite.sid	2017-12-28 14:20:23.882643023 +0100
+++ /home/thomas/projekte/jigdo_dir/jigdo-lite.sid.with_md5_check	2017-12-29 20:09:34.055048360 +0100
@@ -29,6 +29,13 @@ else
   windows=false
   nl='\n'
 fi
+
+# Counter of MD5 mismatches after download
+fileMismatchCount=0
+# Short list of URLs which yielded mismatch
+fileMismatchList=""
+# Very few mismatching URLs shall be shown at the end. They can be long.
+fileMismatchMaxRec=3
 #______________________________________________________________________
 
 # read with readline, only if running bash >=2.03 (-e gives error on POSIX)
@@ -75,10 +82,84 @@ fetch() {
 }
 #______________________________________________________________________
 
-# Given URLs, fetch them into $imageTmp, then merge them into image
+# DEVELOPMENT TEST:
+# Set to a package name of the ISO to get simulated MD5 mismatch
+# simulateMD5Mismatch="partman-reiserfs_50_all.udeb"
+simulateMD5Mismatch="NOT_A_PACKAGE_NAME"
+
+# Given URLs and MD5s, fetch them into $imageTmp and verify,
+# then merge them into image
 fetchAndMerge() {
+
+  # The other arguments are URLs in the same sequence as the words in md5List
+  md5List="$1"
+  shift 1
+
   if test "$#" -eq 0; then return 0; fi
   fetch --force-directories --directory-prefix="$imageTmp" -- "$@"
+
+  # Try to verify downloaded files
+  for md5 in $md5List
+  do
+    url="$1"
+    shift 1
+    test "$md5" = ".no.MD5.known." && continue
+
+    # Simulated MD5 mismatch
+    if echo "$url" | grep '/'"$simulateMD5Mismatch"'$' >/dev/null 2>&1
+    then
+      echo "DEVELOPMENT TEST: Faking a checksum mismatch with package $simulateMD5Mismatch" >&2
+      md5="*INVALIDATED*CHECKSUM*"
+    fi
+
+    # Alternative proposals by Philip Hands:
+    # localPath="$imageTmp"/`echo "$url" | sed -e 's,^\(https\?\|ftp\|file\)://,,i'`
+    # localPath="$imageTmp/${url#[[:alpha:]]*://}"
+
+    localPath="$imageTmp"/`echo "$url" | \
+                            sed -e 's/^[hH][tT][tT][pP]:\/\///' \
+                                -e 's/^[hH][tT][tT][pP][sS]:\/\///' \
+                                -e 's/^[fF][tT][pP]:\/\///' \
+                                -e 's/^[fF][iI][lL][eE]:\/\///'`
+
+    if test ! -e "$localPath"
+    then
+      # Maybe the file was downloaded but above guess was wrong
+      baseName=`basename "$url"`
+      localPath=`find "$imageTmp" -name "$baseName" | head -1`
+    fi
+
+    if test -n "$localPath" -a -e "$localPath"
+    then
+      fileMD5=`$jigdoFile md5sum --report=quiet "$localPath" | sed 's/ .*$//'`
+      if test "$md5" != "$fileMD5"
+      then
+        echo >&2
+        echo "WARNING: Downloaded file does not match expected MD5:" >&2
+        echo "         $url" >&2
+        echo "         $localPath" >&2
+        echo "         expected: $md5 | $fileMD5 :downloaded" >&2
+        echo >&2
+
+        # Record info for the Aaargh message
+        fileMismatchCount=`expr "$fileMismatchCount" + 1`
+        if test "$fileMismatchCount" -le "$fileMismatchMaxRec"
+        then
+          fileMismatchList="$fileMismatchList , $url"
+        fi
+
+        # If the mismatch is simulated: play along and prevent merging
+        if test "$md5" = "*INVALIDATED*CHECKSUM*"
+        then
+          echo "DEVELOPMENT TEST: Removing file to complete mismatch simulation" >&2
+          echo >&2
+          rm "$localPath"
+        fi
+
+      fi
+    fi
+  done
+
   # Merge into the image
   $jigdoFile $jigdoOpts --no-cache make-image --image="$image" \
     --jigdo="$jigdoF" --template="$template" "$imageTmp"
@@ -574,6 +655,10 @@ imageDownload() {
     fetchTemplate || return 1
     hrule
 
+    # A new cycle of possible aaarghing begins
+    fileMismatchCount=0
+    fileMismatchList=""
+
     # If a "file:" URI was given instead of a server URL, try to merge
     # any files into the image.
     echo "Merging parts from \`file:' URIs, if any..."
@@ -596,31 +681,55 @@ imageDownload() {
     for pass in x xx xxx xxxx xxxxx xxxxxx xxxxxxx xxxxxxxx; do
       $jigdoFile print-missing-all --image="$image" --jigdo="$jigdoF" \
         --template="$template" $jigdoOpts $uriOpts \
-      | egrep -i '^(http:|ftp:|$)' >"$list"
+       >"$list"
       missingCount=`egrep '^$' <"$list" | wc -l | sed -e 's/ *//g'`
       # Accumulate URLs in $@, pass them to fetchAndMerge in batches
       shift "$#" # Solaris /bin/sh doesn't understand "set --"
       count=""
       exec 3<"$list"
+      useUrl="."
+      md5=".no.MD5.known."
+      md5List=""
       while $readLine url <&3; do
         count="x$count"
-        if strEmpty "$url"; then count=""; continue; fi
-        if test "$count" != "$pass"; then continue; fi
-        if $noMorePasses; then
-          hrule
-          echo "$missingCount files not found in previous pass, trying"
-          echo "alternative download locations:"
-          echo
-        fi
-        noMorePasses=false
-        set -- "$@" "$url"
-        if test "$#" -ge "$filesPerFetch"; then
-          if fetchAndMerge "$@"; then true; else exec 3<&-; return 1; fi
-          shift "$#" # Solaris /bin/sh doesn't understand "set --"
+        if strEmpty "$url"
+        then
+          if test "$useUrl" != '.' 
+          then
+            set -- "$@" "$useUrl"
+            md5List="$md5List $md5"
+          fi
+          if test "$#" -ge "$filesPerFetch"; then
+            if fetchAndMerge "$md5List" "$@"
+            then
+              true
+            else
+              exec 3<&-
+              return 1
+            fi
+            shift "$#" # Solaris /bin/sh doesn't understand "set --"
+            md5List=""
+          fi
+          count=""
+          useUrl="."
+          md5=".no.MD5.known."
+        elif echo " $url" | egrep '^ MD5Sum:' >/dev/null 2>&1
+        then
+          md5=`echo " $url" | sed -e 's/ MD5Sum://'`
+        elif test "$count" = "$pass"
+        then
+          useUrl="$url"
+          if $noMorePasses; then
+            hrule
+            echo "$missingCount files not found in previous pass, trying"
+            echo "alternative download locations:"
+            echo
+          fi
+          noMorePasses=false
         fi
       done
       exec 3<&-
-      if test "$#" -ge 1; then fetchAndMerge "$@" || return 1; fi
+      if test "$#" -ge 1; then fetchAndMerge "$md5List" "$@" || return 1; fi
       if $noMorePasses; then break; fi
       if test -r "$image"; then break; fi
       noMorePasses=true
@@ -630,21 +739,37 @@ imageDownload() {
     if test -r "$image"; then break; fi
 
     hrule
-    echo "Aaargh - $missingCount files could not be downloaded. This should not"
-    echo "happen! Depending on the problem, it may help to retry downloading"
-    echo "the missing files."
-    if $batch; then return 1; fi
-    if $usesDebian || $usesNonus; then
-    echo "Also, you could try changing to another Debian or Non-US server,"
-    echo "in case the one you used is out of sync."
-    fi
-    echo
-    echo "However, if all the files downloaded without errors and you"
-    echo "still get this message, it means that the files changed on the"
-    echo "server, so the image cannot be generated."
-    if $usesDebian || $usesNonus; then
-    echo "As a last resort, you could try to complete the CD image download"
-    echo "by fetching the remaining data with rsync."
+    echo "Aaargh - $missingCount files remain missing. This should not happen!"
+    if test "$fileMismatchCount" -gt 0
+    then
+      echo "$fileMismatchCount download attempts yielded files with mismatching MD5 checksums:"
+      echo
+      echo "$fileMismatchList" | sed -e 's/ , /   /' -e 's/ , /\n   /g'
+      if test "$fileMismatchCount" -gt "$fileMismatchMaxRec"
+      then
+        echo "   ... the WARNING messages of this run report more ..."
+      fi
+      if $batch; then return 1; fi
+      echo
+      echo "After a retry with the same mirror consider to search the web"
+      echo "for the names of remaining mismatching packages. As mirror name"
+      echo 'use the found URL up to the "/" before the directory name "pool".'
+    else
+      echo "Depending on the problem, it may help to retry downloading"
+      echo "the missing files."
+      if $batch; then return 1; fi
+      if $usesDebian || $usesNonus; then
+      echo "Also, you could try changing to another Debian or Non-US server,"
+      echo "in case the one you used is out of sync."
+      fi
+      echo
+      echo "However, if all the files downloaded without errors and you"
+      echo "still get this message, it means that the files changed on the"
+      echo "server, so the image cannot be generated."
+      if $usesDebian || $usesNonus; then
+      echo "As a last resort, you could try to complete the CD image download"
+      echo "by fetching the remaining data with rsync."
+      fi
     fi
     echo
     echo "Press Return to retry downloading the missing files."
============================================================================


Have a nice day :)

Thomas


Reply to: