[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: jigdo-file: Does not report package rejections because checksum mismatch



Hi,

i Cc: debian-cd with this follow-up to bug 884526 in the hope to get
some review for the endeavor to detect damaged downloaded package files
during a run of jigdo-lite.

Some disputable aspects remain (plus a possible bug in current jigdo-lite,
which will vanish by my proposal). I put them behind my changeset, which is
based on Sid of today.

============================================================================
How it works:

Currently and in my proposal jigdo-lite asks jigdo-file for a list of files
yet missing in the emerging image. It downloads them in groups of 10 and
submits each group to jigdo-file for grafting them into the image.
The list consists of text blocks of the form:
  URL of package
  Alternative URL of same package
  ... more alternative URLs ...
  MD5Sum:...
  <empty line>
One of the URLs from each block gets picked by a loop with variable "pass",
but currently jigdo-file ignores the MD5Sum lines.

My proposal collects the MD5s and submits them to function fetchAndMerge
which currently only gets the 10 URLs.
I had to re-arrange the entrails of the list reading loop in imageDownload.
It currently ends when the 10th URL is found and thus would not have yet
reached the 10th checksum line. My proposal uses the empty line as trigger
to append the picked URL and the MD5 to the argument list.

Tested with Jessie's jigdo-file and these three input lines:

  http://cdimage.debian.org/mirror/cdimage/archive/6.0.7/amd64/jigdo-cd/debian-6.0.7-amd64-businesscard.jigdo
  
  http://us.cdimage.debian.org/cdimage/snapshot/Debian/

--- /usr/bin/jigdo-lite.sid	2017-12-28 14:20:23.882643023 +0100
+++ /home/thomas/projekte/jigdo_dir/jigdo-lite.sid.with_md5_check	2017-12-28 15:12:37.738654856 +0100
@@ -75,10 +75,61 @@ fetch() {
 }
 #______________________________________________________________________
 
-# Given URLs, fetch them into $imageTmp, then merge them into image
+# Simulated MD5 mismatch:
+simulateMD5Mismatch="partman-reiserfs_50_all.udeb"
+# simulateMD5Mismatch="NOT_A_PACKAGE_NAME"
+
+# Given URLs and MD5s, fetch them into $imageTmp and verify,
+# then merge them into image
 fetchAndMerge() {
+
+  # The other arguments are URLs in the same sequence as the words in md5List
+  md5List="$1"
+  shift 1
+
   if test "$#" -eq 0; then return 0; fi
   fetch --force-directories --directory-prefix="$imageTmp" -- "$@"
+
+  # Try to verify downloaded files
+  for md5 in $md5List
+  do
+    url="$1"
+    shift 1
+    test "$md5" = ".no.MD5.known." && continue
+
+    # Simulated MD5 mismatch
+    if echo "$url" | grep '/'"$simulateMD5Mismatch"'$' >/dev/null 2>&1
+    then
+      echo "ATTENTION : Faking a checksum mismatch with package $simulateMD5Mismatch" >&2
+      md5="*INVALIDATED*CHECKSUM*"
+    fi
+
+    localPath="$imageTmp"/`echo "$url" | \
+                            sed -e 's/^[hH][tT][tT][pP]:\/\///' \
+                                -e 's/^[hH][tT][tT][pP][sS]:\/\///' \
+                                -e 's/^[fF][tT][pP]:\/\///' \
+                                -e 's/^[fF][iI][lL][eE]:\/\///'`
+    if test -e "$localPath"
+    then
+      fileMD5=`$jigdoFile md5 "$localPath" 2>/dev/null | awk '{print $1}'`
+      if test "$md5" != "$fileMD5"
+      then
+        echo >&2
+        echo "WARNING: Downloaded file does not match expected MD5:" >&2
+        echo "         $url" >&2
+        echo "         $localPath" >&2
+        echo "         expected: $md5 | $fileMD5 :downloaded" >&2
+        echo >&2
+      fi
+    else
+      echo >&2
+      echo "WARNING: File not fot found after download:" >&2
+      echo "         $url" >&2
+      echo "         $localPath" >&2
+      echo >&2
+    fi
+  done
+
   # Merge into the image
   $jigdoFile $jigdoOpts --no-cache make-image --image="$image" \
     --jigdo="$jigdoF" --template="$template" "$imageTmp"
@@ -596,31 +647,56 @@ imageDownload() {
     for pass in x xx xxx xxxx xxxxx xxxxxx xxxxxxx xxxxxxxx; do
       $jigdoFile print-missing-all --image="$image" --jigdo="$jigdoF" \
         --template="$template" $jigdoOpts $uriOpts \
-      | egrep -i '^(http:|ftp:|$)' >"$list"
+       >"$list"
+      # This counts the empty lines
       missingCount=`egrep '^$' <"$list" | wc -l | sed -e 's/ *//g'`
       # Accumulate URLs in $@, pass them to fetchAndMerge in batches
       shift "$#" # Solaris /bin/sh doesn't understand "set --"
       count=""
       exec 3<"$list"
+      useUrl="."
+      md5=".no.MD5.known."
+      md5List=""
       while $readLine url <&3; do
         count="x$count"
-        if strEmpty "$url"; then count=""; continue; fi
-        if test "$count" != "$pass"; then continue; fi
-        if $noMorePasses; then
-          hrule
-          echo "$missingCount files not found in previous pass, trying"
-          echo "alternative download locations:"
-          echo
-        fi
-        noMorePasses=false
-        set -- "$@" "$url"
-        if test "$#" -ge "$filesPerFetch"; then
-          if fetchAndMerge "$@"; then true; else exec 3<&-; return 1; fi
-          shift "$#" # Solaris /bin/sh doesn't understand "set --"
+        if strEmpty "$url"
+        then
+          if test "$useUrl" != '.' 
+          then
+            set -- "$@" "$useUrl"
+            md5List="$md5List $md5"
+          fi
+          if test "$#" -ge "$filesPerFetch"; then
+            if fetchAndMerge "$md5List" "$@"
+            then
+              true
+            else
+              exec 3<&-
+              return 1
+            fi
+            shift "$#" # Solaris /bin/sh doesn't understand "set --"
+            md5List=""
+          fi
+          count=""
+          useUrl="."
+          md5=".no.MD5.known."
+        elif echo " $url" | egrep '^ MD5Sum:' >/dev/null 2>&1
+        then
+          md5=`echo " $url" | sed -e 's/ MD5Sum://'`
+        elif test "$count" = "$pass"
+        then
+          useUrl="$url"
+          if $noMorePasses; then
+            hrule
+            echo "$missingCount files not found in previous pass, trying"
+            echo "alternative download locations:"
+            echo
+          fi
+          noMorePasses=false
         fi
       done
       exec 3<&-
-      if test "$#" -ge 1; then fetchAndMerge "$@" || return 1; fi
+      if test "$#" -ge 1; then fetchAndMerge "$md5List" "$@" || return 1; fi
       if $noMorePasses; then break; fi
       if test -r "$image"; then break; fi
       noMorePasses=true
============================================================================

Especially in need of attention:

- I could not find a clear description how "wget" determines the local
  file paths from URLs and option --directory-prefix="$imageTmp".
  My current conversion from URL to path is purely heuristic therefore:

    localPath="$imageTmp"/`echo "$url" | \
                           sed -e 's/^[hH][tT][tT][pP]:\/\///' \
                               -e 's/^[hH][tT][tT][pP][sS]:\/\///' \
                               -e 's/^[fF][tT][pP]:\/\///' \
                               -e 's/^[fF][iI][lL][eE]:\/\///'`

- I introduced a dependency on "awk", which was not used in jigdo-lite
  before. The task is to obtain the first word of jigdo-file's output:

    fileMD5=`$jigdoFile md5 "$localPath" 2>/dev/null | awk '{print $1}'`

- The script does not prevent files with unexpected MD5 from being handed
  over to jigdo-file for grafting into the ISO.
  This has the advantage of not hampering success in case of false alarm.
  But it may also burry the new warning messages in the avalanche of
  jigdo-lite messages.

- There is no fix yet for the potentially wrong message

    echo "Aaargh - $missingCount files could not be downloaded. [...]"

  The term "downloaded" is wrong if just the MD5 of a downloaded package 
  did not match any desired file. 
  A lame improvement would be

    "could not be downloaded or did not match the expected checksums."

  But better would be a list of experienced problems:
  Not downloaded, MD5 mismatch after download, not accepted into emerging
  image.

- There is still mismatch simulating code in my changeset.
    #  Simulated MD5 mismatch:
    simulateMD5Mismatch="partman-reiserfs_50_all.udeb"
    # simulateMD5Mismatch="NOT_A_PACKAGE_NAME"
  It should probably be removed for a production version of jigdo-lite.
  It watches for a particular package name and defaces the expected MD5
  string. This yields on stderr:

ATTENTION : Faking a checksum mismatch with package partman-reiserfs_50_all.udeb

WARNING: Downloaded file does not match expected MD5:
         http://us.cdimage.debian.org/cdimage/snapshot/Debian/pool/main/p/partman-reiserfs/partman-reiserfs_50_all.udeb
         debian-6.0.7-amd64-businesscard.iso.tmpdir/us.cdimage.debian.org/cdimage/snapshot/Debian/pool/main/p/partman-reiserfs/partman-reiserfs_50_all.udeb
         expected: *INVALIDATED*CHECKSUM* | _2QsTFP7LjB6uXQZPYSeeA :downloaded

- I assume that this line, which i removed to see the MD5s, was forgotten with
  03.jigdo-lite-https.patch :

      | egrep -i '^(http:|ftp:|$)' >"$list"

  In my own URL converting expressions, i took the current function isURI()
  as guideline.


Have a nice day :)

Thomas


Reply to: