Bug#500491: Invalid character sequences in Packages files
There are too many Packages files, and they are permanently
updating-changing. I suppose more systematic control than
occasional bug-reports would be preferred - somewhere in APT
code that parses/extracts package control files...
By now it's dummy script here, only in etch main_binary Packages it
found these problems:
$ cat /var/lib/apt/lists/ftp.debian.org_debian_dists_etch_main_binary\
-i386_Packages | ./utf8check > utf8check.out ; echo $?
W: Suspicious for bad byte sequence at line: 26071, package: cadubi
W: Suspicious for bad byte sequence at line: 52962, package: doc-linux-html-pt
W: Suspicious for bad byte sequence at line: 52965, package: doc-linux-html-pt
W: Suspicious for bad byte sequence at line: 52967, package: doc-linux-html-pt
W: Suspicious for bad byte sequence at line: 53220, package: doc-linux-text-pt
W: Suspicious for bad byte sequence at line: 53223, package: doc-linux-text-pt
W: Suspicious for bad byte sequence at line: 53225, package: doc-linux-text-pt
W: Suspicious for bad byte sequence at line: 60240, package: elmo
W: Suspicious for bad byte sequence at line: 67095, package: fcmp
W: Suspicious for bad byte sequence at line: 86287, package: glade-perl
W: Suspicious for bad byte sequence at line: 119429, package: itcl3
W: Suspicious for bad byte sequence at line: 119458, package: itcl3-dev
W: Suspicious for bad byte sequence at line: 119487, package: itcl3-doc
W: Suspicious for bad byte sequence at line: 192329, package: libglade-perl
W: Suspicious for bad byte sequence at line: 345701, package: pyca
$ cat utf8check.out | uniq
cadubi
doc-linux-html-pt
doc-linux-text-pt
elmo
fcmp
glade-perl
itcl3
itcl3-dev
itcl3-doc
libglade-perl
pyca
#!/bin/bash
# Scans APT-Packages-formatted file for invalid UTF-8 byte sequences,
# try to determine the name of package they belong to.
# Warning: It may takes five minutes and more to scan large "Packages"
# files (like that of Debian main repository is).
declare package
declare prev_package="<no previous>"
declare -i linenum=0
declare unclear
declare -i count=0
while read line ; do
(( ++ linenum ))
iconv -t UTF32 <<< "$line" &> /dev/null
[ 0 -eq $? ] && {
if [ -z "$line" ] ; then
prev_package="$package"
package=
elif [[ "$line" =~ '^Package: (.*)' ]] ; then
package="${BASH_REMATCH[1]}"
fi
continue
}
(( ++ count ))
echo -n "W: Suspicious for bad byte sequence at line: $linenum, " >&2
echo "package: ${package:-goes after $prev_package}" >&2
if [ -z "$package" ] ; then
unclear="true"
else
echo "$package"
fi
done
[ -n "$unclear" ] && {
echo "Some problem packages were not identified, see stderr messages"
exit 2
}
[ 0 -eq $count ] && exit 0
exit 1
/ sergio
Reply to: