Bug#61342: your mail

To: Don Armstrong <don@debian.org>, 61342@bugs.debian.org
Subject: Bug#61342: your mail
From: Colin Watson <cjwatson@debian.org>
Date: Wed, 16 Mar 2005 20:24:56 +0000
Message-id: <[🔎] 20050316202456.GC11333@riva.ucam.org>
Reply-to: Colin Watson <cjwatson@debian.org>, 61342@bugs.debian.org
In-reply-to: <20050111165243.GA2664@xieana.donarmstrong.org>
References: <20041231123706.GA23309@xieana.donarmstrong.org> <20041231124851.GB23309@xieana.donarmstrong.org> <20050105043035.GI9370@archimedes.ucr.edu> <20050105210606.GC32680@archimedes.ucr.edu> <20050111165243.GA2664@xieana.donarmstrong.org>

On Tue, Jan 11, 2005 at 08:52:43AM -0800, Don Armstrong wrote:
> tbm brought to my attention another bug that should be fixed, namely
> the maintainer e-mail address that is sent out when someone closes the
> bug:
> 
> http://bugs.donarmstrong.com/cgi/bugreport.cgi?bug=18&msg=12
> 
> The patch now has moved the de_rfc1522 code into Debbugs::MIME since
> this slice of code needs to be used in both the cgi scripts and the
> process script.
> 
> I've also taken the liberty of cleaning up the way that Exporter is
> used, and replaced munging with @ISA with a straightforward use base
> 'Exporter';

OK, here's a very belated review. Humble apologies for the delay. In
general I've applied all the changes I've suggested in my local copy
already, so there's no need to send another patch containing them;
however, I have not yet written the metadata migration script I suggest
below, nor have I dealt with the RFC1522 encoding on output.

I've checked the Debbugs::MIME parts of your patch with my modifications
into CVS HEAD so that you can see what I've done, together with a
debian/control patch to add the appropriate dependency.

In general, I think this is a very good start. Thanks, and thanks for
pestering me about it until I got round to it!

> Index: Debbugs/MIME.pm
> ===================================================================
> RCS file: /cvs/debbugs/source/Debbugs/MIME.pm,v
> retrieving revision 1.1
> diff -u -r1.1 MIME.pm
> --- Debbugs/MIME.pm	3 Aug 2003 09:46:30 -0000	1.1
> +++ Debbugs/MIME.pm	11 Jan 2005 16:43:49 -0000
> @@ -2,19 +2,23 @@
>  
>  use strict;
>  
> -use File::Path;
> -use MIME::Parser;
> -use Exporter ();
> -use vars qw($VERSION @ISA @EXPORT_OK);
> +use base qw(Exporter);
> +use vars qw($VERSION @EXPORT_OK @EXPORT);
>  
>  BEGIN {
>      $VERSION = 1.00;
>  
> -    @ISA = qw(Exporter);
> -    @EXPORT_OK = qw(parse);
> +    @EXPORT_OK = qw(parse de_rfc1522 getmailbody);
> +    @EXPORT = qw(getmailbody);
>  }
>  
> -sub getmailbody ($);

Hm, not sure I like using @EXPORT rather than the deliberate forward
declaration; plus, getmailbody is not really meant to be exported,
because it's an internal function (despite the fact that
scripts/process.in and scripts/service.in currently use another version
of it, since Debbugs::MIME represents an incomplete modularisation
effort). The rest of the Exporter cleanup looks OK, though.

> @@ -95,5 +99,33 @@
>  
>      return { header => [@headerlines], body => [@bodylines]};
>  }
> +
> +=head2 de_rfc1522
> +
> +     de_rfc1522('=?iso-8859-1?Q?D=F6n_Armstr=F3ng?= <don@donarmstrong.com>')
> +
> +Turn RFC-1522 names into the UTF-8 equivalent. [#61342 et al]

Bug reference should probably live in a comment rather than being
exported to the documentation etc.

> +
> +=cut
> +
> +# Set up the default rfc1522 decoder, which turns all charsets that
> +# are supported into the appropriate UTF-8 charset.
> +MIME::WordDecoder->default(new MIME::WordDecoder(['*' => sub {my ($data,$charset) = @_;
> +							      $charset =~ s/^(UTF)\-(\d+)/$1$2/i;
> +							      return $data unless utf8_supported_charset($charset);
> +							      return to_utf8({-string  => $data,
> +									      -charset => $charset,
> +									     });
> +							 },
> +						 ],));

This should be in a BEGIN block, I think. I've also reformatted to 80
columns.

> @@ -181,7 +181,7 @@
>  $indexentry .= htmlpackagelinks($status{package}, 0);
>  
>  $indexentry .= "Reported by: <a href=\"" . submitterurl($status{originator})
> -              . "\">" . htmlsanit($status{originator}) . "</a>;\n";
> +              . "\">" . htmlsanit(de_rfc1522($status{originator})) . "</a>;\n";
>  
>  $indexentry .= "Owned by: " . htmlsanit($status{owner}) . ";\n"
>                if length $status{owner};

I realise it's a database format change, but I'd really prefer to have
the metadata files be pure UTF-8, so that we don't have to process them
for display every time, and to make things like searching easier. We can
always write a migration script.

> @@ -369,7 +369,9 @@
>  		$normstate= 'go';
>  		push @mail, $_;
>  	} elsif ($normstate eq 'html') {
> -		$this .= $_;
> +	        # XXX This is broken. The BTS shouldn't be writing
> +	        # HTML into the log... but it does.
> +		$this .= de_rfc1522($_);
>  	} elsif ($normstate eq 'go') {
>  		s/^\030//;
>  		if (!$suppressnext && !$found_msgid &&

Not sure that qualifies for an XXX comment here. The right place to note
the design flaw is really in Debbugs/Log.pm, which I've done.
Debbugs::Log also provides a better basis for designing the migration to
a new, less broken-by-design record type.

Again, I think the html record should really be in UTF-8, and a
migration script written.

> @@ -409,7 +411,7 @@
>  			$thisheader = "<h2>Message sent:</h2>\n";
>  		} else {
>  			s/\04/, /g; s/\n$//;
> -			$thisheader = "<h2>Message sent to ".htmlsanit($_).":</h2>\n";
> +			$thisheader = "<h2>Message sent to ".htmlsanit(de_rfc1522($_)).":</h2>\n";
>  		}
>  		$this = "";
>  		$normstate= 'kill-body';

And again. Basically, I think mail text should be preserved as-is in the
log (since it means you can send out notifications by mail and preserve
the MIME structure as it was received), but everything else counts as
metadata and should be in a canonical form.

> Index: scripts/process.in
> ===================================================================
> RCS file: /cvs/debbugs/source/scripts/process.in,v
> retrieving revision 1.86
> diff -u -r1.86 process.in
> --- scripts/process.in	5 Aug 2004 15:09:30 -0000	1.86
> +++ scripts/process.in	11 Jan 2005 16:43:49 -0000
> @@ -17,6 +17,9 @@
>  require "$lib_path/errorlib";
>  $ENV{'PATH'} = $lib_path.':'.$ENV{'PATH'};
>  
> +# for de_rfc1522
> +use Debbugs::MIME qw(de_rfc1522);
> +
>  chdir( "$gSpoolDir" ) || die "chdir spool: $!\n";
>  
>  #open(DEBUG,"> /tmp/debbugs.debug");

No need for the comment.

> @@ -252,6 +255,7 @@
>      }
>      $markedby= $header{'from'} eq $replyto ? $replyto :
>                 "$header{'from'} (reply to $replyto)";
> +    $markedby = de_rfc1522($markedby);
>      if ($codeletter eq 'F') {
>          (&appendlog,&finish) if length($data->{forwarded});
>          $receivedat= "forwarded\@$gEmailDomain";

I'd suggest instead simply:

-	$header{$v} = $_;
+	$header{$v} = de_rfc1522($_);

... and likewise in scripts/service.in. That way, all metadata trivially
gets decoded with very little effort, and is stored in canonical form as
suggested above.

> @@ -430,6 +434,7 @@
>  X-$gProject-PR-Package: $data->{package}
>  X-$gProject-PR-Keywords: $data->{keywords}
>  Reply-To: $ref\@$gEmailDomain
> +Content-Type: text/plain; charset="utf-8"
>  
>  This is an automatic notification regarding your $gBug report
>  #$ref: $data->{subject},

This gets more complicated, and is one of the reasons I'd been avoiding
this work. The right way to do this, and fix a few other bugs along the
way, is to attach the text of the original report using MIME rather than
simply appending it; that way, it can have its original content-type,
including character set. So, I'll avoid this part for the time being. We
won't be making anything significantly worse by doing this in two stages
anyway, since it's already broken.

Do we need to re-RFC1522-encode headers in outgoing messages? I think we
probably do. MIME::Words knows how to do that, and despite the warning
in its documentation I think the encode_mimewords() function is safe for
encoding from UTF-8.

Cheers,

-- 
Colin Watson                                       [cjwatson@debian.org]

Reply to:

Follow-Ups:
- Bug#61342: your mail
  - From: Colin Watson <cjwatson@debian.org>
- Bug#61342: your mail
  - From: Don Armstrong <don@debian.org>

Prev by Date: Bug#61342: Please install libunicode-maputf8-perl on spohr
Next by Date: Bug#61342: your mail
Previous by thread: Bug#61342: Please install libunicode-maputf8-perl on spohr
Next by thread: Bug#61342: your mail
Index(es):
- Date
- Thread