Bug#61342: your mail
On Wed, 16 Mar 2005, Colin Watson wrote:
> > Index: Debbugs/MIME.pm
> > ===================================================================
> > RCS file: /cvs/debbugs/source/Debbugs/MIME.pm,v
> > retrieving revision 1.1
> > diff -u -r1.1 MIME.pm
> > --- Debbugs/MIME.pm 3 Aug 2003 09:46:30 -0000 1.1
> > +++ Debbugs/MIME.pm 11 Jan 2005 16:43:49 -0000
> > @@ -2,19 +2,23 @@
> >
> > use strict;
> >
> > -use File::Path;
> > -use MIME::Parser;
> > -use Exporter ();
> > -use vars qw($VERSION @ISA @EXPORT_OK);
> > +use base qw(Exporter);
> > +use vars qw($VERSION @EXPORT_OK @EXPORT);
> >
> > BEGIN {
> > $VERSION = 1.00;
> >
> > - @ISA = qw(Exporter);
> > - @EXPORT_OK = qw(parse);
> > + @EXPORT_OK = qw(parse de_rfc1522 getmailbody);
> > + @EXPORT = qw(getmailbody);
> > }
> >
> > -sub getmailbody ($);
>
> Hm, not sure I like using @EXPORT rather than the deliberate forward
> declaration; plus, getmailbody is not really meant to be exported,
> because it's an internal function (despite the fact that
> scripts/process.in and scripts/service.in currently use another version
> of it, since Debbugs::MIME represents an incomplete modularisation
> effort). The rest of the Exporter cleanup looks OK, though.
Right. I think I missed that one. Yeah, just strip out the @EXPORT
part then. [It really shouldn't every be used anyway, since exporting
things by default is broken, IMO.]
> > +Turn RFC-1522 names into the UTF-8 equivalent. [#61342 et al]
>
> Bug reference should probably live in a comment rather than being
> exported to the documentation etc.
Yeah, that's just a personal thing. I usually stick more detailed
documentation into POD than most people do.
> > +
> > +=cut
> > +
> > +# Set up the default rfc1522 decoder, which turns all charsets that
> > +# are supported into the appropriate UTF-8 charset.
> > +MIME::WordDecoder->default(new MIME::WordDecoder(['*' => sub {my ($data,$charset) = @_;
> > + $charset =~ s/^(UTF)\-(\d+)/$1$2/i;
> > + return $data unless utf8_supported_charset($charset);
> > + return to_utf8({-string => $data,
> > + -charset => $charset,
> > + });
> > + },
> > + ],));
>
> This should be in a BEGIN block, I think. I've also reformatted to 80
> columns.
It doesn't matter in this case. I didn't make it a BEGIN block,
because nothing should be calling it until after all BEGIN blocks have
been run. [And I tend not to stick things in BEGIN blocks unless they
actually have to be there.]
> I realise it's a database format change, but I'd really prefer to
> have the metadata files be pure UTF-8, so that we don't have to
> process them for display every time, and to make things like
> searching easier. We can always write a migration script.
I think that's the optimal solution too. However, this patch at least
will work now, and we can move to pure UTF-8 later.
> > + # XXX This is broken. The BTS shouldn't be writing
> > + # HTML into the log... but it does.
> > + $this .= de_rfc1522($_);
>
> Not sure that qualifies for an XXX comment here. The right place to
> note the design flaw is really in Debbugs/Log.pm, which I've done.
> Debbugs::Log also provides a better basis for designing the
> migration to a new, less broken-by-design record type.
>
> Again, I think the html record should really be in UTF-8, and a
> migration script written.
Yeah, no argument here. That XXX comment was basically a warning for
anyone looking at that segment of code, because that string really
isn't just a rfc1522 header. [And I didn't want to make changes in
unrelated parts of the code if I could avoid it.]
> And again. Basically, I think mail text should be preserved as-is in
> the log (since it means you can send out notifications by mail and
> preserve the MIME structure as it was received), but everything else
> counts as metadata and should be in a canonical form.
Definetly.
> I'd suggest instead simply:
>
> - $header{$v} = $_;
> + $header{$v} = de_rfc1522($_);
>
> ... and likewise in scripts/service.in. That way, all metadata
> trivially gets decoded with very little effort, and is stored in
> canonical form as suggested above.
Then we'll also have to do the migration. This bit of code won't hurt
if the header has already been run through de_rfc1522, though.
> > @@ -430,6 +434,7 @@
> > X-$gProject-PR-Package: $data->{package}
> > X-$gProject-PR-Keywords: $data->{keywords}
> > Reply-To: $ref\@$gEmailDomain
> > +Content-Type: text/plain; charset="utf-8"
> >
> > This is an automatic notification regarding your $gBug report
> > #$ref: $data->{subject},
>
> This gets more complicated, and is one of the reasons I'd been
> avoiding this work. The right way to do this, and fix a few other
> bugs along the way, is to attach the text of the original report
> using MIME rather than simply appending it; that way, it can have
> its original content-type, including character set. So, I'll avoid
> this part for the time being. We won't be making anything
> significantly worse by doing this in two stages anyway, since it's
> already broken.
>
> Do we need to re-RFC1522-encode headers in outgoing messages?
Yes... which is one of the reasons why I didn't play with the RFC1522
encoding in the log.
Don Armstrong
--
I never until now realized that the primary job of any emoticon is to
say "excuse me, that didn't make any sense." ;-P
-- Cory Doctorow
http://www.donarmstrong.com http://rzlab.ucr.edu
Reply to: