[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#500491: Invalid character sequences in Packages files




There are too many Packages files, and they are permanently
updating-changing. I suppose more systematic control than
occasional bug-reports would be preferred - somewhere in APT
code that parses/extracts package control files...

By now it's dummy script here, only in etch main_binary Packages it
found these problems:

$ cat /var/lib/apt/lists/ftp.debian.org_debian_dists_etch_main_binary\
-i386_Packages | ./utf8check > utf8check.out ;  echo $?
W: Suspicious for bad byte sequence at line: 26071, package: cadubi
W: Suspicious for bad byte sequence at line: 52962, package: doc-linux-html-pt
W: Suspicious for bad byte sequence at line: 52965, package: doc-linux-html-pt
W: Suspicious for bad byte sequence at line: 52967, package: doc-linux-html-pt
W: Suspicious for bad byte sequence at line: 53220, package: doc-linux-text-pt
W: Suspicious for bad byte sequence at line: 53223, package: doc-linux-text-pt
W: Suspicious for bad byte sequence at line: 53225, package: doc-linux-text-pt
W: Suspicious for bad byte sequence at line: 60240, package: elmo
W: Suspicious for bad byte sequence at line: 67095, package: fcmp
W: Suspicious for bad byte sequence at line: 86287, package: glade-perl
W: Suspicious for bad byte sequence at line: 119429, package: itcl3
W: Suspicious for bad byte sequence at line: 119458, package: itcl3-dev
W: Suspicious for bad byte sequence at line: 119487, package: itcl3-doc
W: Suspicious for bad byte sequence at line: 192329, package: libglade-perl
W: Suspicious for bad byte sequence at line: 345701, package: pyca

$ cat utf8check.out | uniq
cadubi
doc-linux-html-pt
doc-linux-text-pt
elmo
fcmp
glade-perl
itcl3
itcl3-dev
itcl3-doc
libglade-perl
pyca



#!/bin/bash

  # Scans APT-Packages-formatted file for invalid UTF-8 byte sequences,
  #   try to determine the name of package they belong to.
  # Warning: It may takes five minutes and more to scan large "Packages"
  #   files (like that of Debian main repository is).

  declare package
  declare prev_package="<no previous>"
  declare -i linenum=0
  declare unclear
  declare -i count=0

  while read line ;  do
    (( ++ linenum ))
    iconv -t UTF32 <<< "$line" &> /dev/null
    [ 0 -eq $? ]  &&  {
      if [ -z "$line" ] ;  then
        prev_package="$package"
        package=
      elif [[ "$line" =~ '^Package: (.*)' ]] ;  then
        package="${BASH_REMATCH[1]}"
      fi
      continue
    }
    (( ++ count ))
    echo -n "W: Suspicious for bad byte sequence at line: $linenum, " >&2
    echo "package: ${package:-goes after $prev_package}" >&2
    if [ -z "$package" ] ;  then
      unclear="true"
    else
      echo "$package"
    fi
  done
  [ -n "$unclear" ]  &&  {
    echo "Some problem packages were not identified, see stderr messages"
    exit 2
  }
  [ 0 -eq $count ]  &&  exit 0
  exit 1



  / sergio

Reply to: