Re: lintian UTF-8 changelog checks
On Thu, Feb 12, 2004 at 04:37:33PM +0100, Jeroen van Wolffelaar wrote:
[...]
> I've just tagged you bug #175318 moreinfo and removed the 'patch' tag,
> because there are some problems with it. What's your opinion on the
> bug, and the issues that are with it? Do you think you have the time and
> willingness to look at it and improve the patch, or would you rather not
> spend your time on it?
[...]
Here are 2 other approaches, choose your preferred one.
I removed the reference to policy from changelog-file.desc, because the
status of this recommandation is not clear for me.
Denis
Index: changelog-file
===================================================================
--- changelog-file (revision 31)
+++ changelog-file (working copy)
@@ -188,6 +188,10 @@
}
}
+# check that changelog is UTF-8 encoded
+system("iconv -f utf8 -t utf8 changelog >/dev/null 2>&1") == 0 or
+ print "E: $pkg $type: debian-changelog-file-uses-obsolete-national-charset\n";
+
# read the changelog itself
#
# emacs only looks at the last "local variables:" in a file, and only at
Index: changelog-file.desc
===================================================================
--- changelog-file.desc (revision 31)
+++ changelog-file.desc (working copy)
@@ -89,3 +89,15 @@
files. Instead, put something like this in your ~/.emacs:
.
(setq debian-changelog-mailing-address "userid@debian.org")
+
+Tag: debian-changelog-file-uses-obsolete-national-charset
+Type: error
+Info: The Debian changelog file must be valid UTF-8, an encoding of
+ the Unicode character set.
+ .
+ There are many ways to convert a changelog from an obsoleted charset
+ like ISO-8859-1; you may for example use "iconv" like:
+ .
+ $ iconv -f ISO-8859-1 -t UTF-8 changelog > changelog.new
+ .
+ $ mv changelog.new changelog
Index: changelog-file
===================================================================
--- changelog-file (revision 31)
+++ changelog-file (working copy)
@@ -194,9 +194,30 @@
# one within 3000 chars of EOF and on the last page (^L), but that's a bit
# pesky to replicate. Demanding a match of $prefix and $suffix ought to
# be enough to avoid false positives.
+#
+# check that changelog is UTF-8 encoded.
open IN, "changelog" or fail("cannot find changelog for $type package $pkg");
my ($prefix, $suffix);
+my $hasTextIconv = 0;
+my $converter;
+eval q{ use Text::Iconv };
+if ($@) {
+ print "N: The Text::Iconv perl module is not installed, so lintian\n";
+ print "N: cannot check whether the Debian changelog file is valid UTF-8.\n";
+} else {
+ $hasTextIconv = 1;
+ $converter = Text::Iconv->new("UTF-8", "UCS-4");
+}
+sub check_utf8 {
+ return 1 unless $hasTextIconv;
+ return defined($converter->convert(shift));
+}
+
while (<IN>) {
+ if (!check_utf8($_)) {
+ print "E: $pkg $type: debian-changelog-file-uses-obsolete-national-charset\n";
+ $hasTextIconv = 0;
+ }
if (/^(.*)Local variables:(.*)$/i) {
$prefix = $1;
$suffix = $2;
Index: changelog-file.desc
===================================================================
--- changelog-file.desc (revision 31)
+++ changelog-file.desc (working copy)
@@ -89,3 +89,15 @@
files. Instead, put something like this in your ~/.emacs:
.
(setq debian-changelog-mailing-address "userid@debian.org")
+
+Tag: debian-changelog-file-uses-obsolete-national-charset
+Type: error
+Info: The Debian changelog file must be valid UTF-8, an encoding of
+ the Unicode character set.
+ .
+ There are many ways to convert a changelog from an obsoleted charset
+ like ISO-8859-1; you may for example use "iconv" like:
+ .
+ $ iconv -f ISO-8859-1 -t UTF-8 changelog > changelog.new
+ .
+ $ mv changelog.new changelog
Reply to: