[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: lintian UTF-8 changelog checks



On Thu, Feb 12, 2004 at 04:37:33PM +0100, Jeroen van Wolffelaar wrote:
[...]
> I've just tagged you bug #175318 moreinfo and removed the 'patch' tag,
> because there are some problems with it.  What's your opinion on the
> bug, and the issues that are with it? Do you think you have the time and
> willingness to look at it and improve the patch, or would you rather not
> spend your time on it?
[...]

Here are 2 other approaches, choose your preferred one.
I removed the reference to policy from changelog-file.desc, because the
status of this recommandation is not clear for me.

Denis
Index: changelog-file
===================================================================
--- changelog-file	(revision 31)
+++ changelog-file	(working copy)
@@ -188,6 +188,10 @@
     }
 }
 
+# check that changelog is UTF-8 encoded
+system("iconv -f utf8 -t utf8 changelog >/dev/null 2>&1") == 0 or
+    print "E: $pkg $type: debian-changelog-file-uses-obsolete-national-charset\n";
+
 # read the changelog itself
 #
 # emacs only looks at the last "local variables:" in a file, and only at
Index: changelog-file.desc
===================================================================
--- changelog-file.desc	(revision 31)
+++ changelog-file.desc	(working copy)
@@ -89,3 +89,15 @@
  files.  Instead, put something like this in your ~/.emacs:
  .
  (setq debian-changelog-mailing-address "userid@debian.org")
+
+Tag: debian-changelog-file-uses-obsolete-national-charset
+Type: error
+Info: The Debian changelog file must be valid UTF-8, an encoding of
+ the Unicode character set.
+ .
+ There are many ways to convert a changelog from an obsoleted charset
+ like ISO-8859-1; you may for example use "iconv" like:
+ .
+ $ iconv -f ISO-8859-1 -t UTF-8 changelog > changelog.new
+ .
+ $ mv changelog.new changelog
Index: changelog-file
===================================================================
--- changelog-file	(revision 31)
+++ changelog-file	(working copy)
@@ -194,9 +194,30 @@
 # one within 3000 chars of EOF and on the last page (^L), but that's a bit
 # pesky to replicate.  Demanding a match of $prefix and $suffix ought to
 # be enough to avoid false positives.
+# 
+# check that changelog is UTF-8 encoded.
 open IN, "changelog" or fail("cannot find changelog for $type package $pkg");
 my ($prefix, $suffix);
+my $hasTextIconv = 0;
+my $converter;
+eval q{ use Text::Iconv };
+if ($@) {
+    print "N: The Text::Iconv perl module is not installed, so lintian\n";
+    print "N: cannot check whether the Debian changelog file is valid UTF-8.\n";
+} else {
+    $hasTextIconv = 1;
+    $converter = Text::Iconv->new("UTF-8", "UCS-4");
+}
+sub check_utf8 {
+    return 1 unless $hasTextIconv;
+    return defined($converter->convert(shift));
+}
+
 while (<IN>) {
+    if (!check_utf8($_)) {
+        print "E: $pkg $type: debian-changelog-file-uses-obsolete-national-charset\n";
+        $hasTextIconv = 0;
+    }
     if (/^(.*)Local variables:(.*)$/i) {
 	$prefix = $1;
 	$suffix = $2;
Index: changelog-file.desc
===================================================================
--- changelog-file.desc	(revision 31)
+++ changelog-file.desc	(working copy)
@@ -89,3 +89,15 @@
  files.  Instead, put something like this in your ~/.emacs:
  .
  (setq debian-changelog-mailing-address "userid@debian.org")
+
+Tag: debian-changelog-file-uses-obsolete-national-charset
+Type: error
+Info: The Debian changelog file must be valid UTF-8, an encoding of
+ the Unicode character set.
+ .
+ There are many ways to convert a changelog from an obsoleted charset
+ like ISO-8859-1; you may for example use "iconv" like:
+ .
+ $ iconv -f ISO-8859-1 -t UTF-8 changelog > changelog.new
+ .
+ $ mv changelog.new changelog

Reply to: