Bug#61342: your mail
- To: Don Armstrong <don@debian.org>, 61342@bugs.debian.org
- Subject: Bug#61342: your mail
- From: Colin Watson <cjwatson@debian.org>
- Date: Sat, 9 Apr 2005 17:30:49 +0100
- Message-id: <[🔎] 20050409163049.GA28967@riva.ucam.org>
- Reply-to: Colin Watson <cjwatson@debian.org>, 61342@bugs.debian.org
- In-reply-to: <20050317005506.GD18304@archimedes.ucr.edu>
- References: <20041231123706.GA23309@xieana.donarmstrong.org> <20041231124851.GB23309@xieana.donarmstrong.org> <20050105043035.GI9370@archimedes.ucr.edu> <20050105210606.GC32680@archimedes.ucr.edu> <20050111165243.GA2664@xieana.donarmstrong.org> <20050316202456.GC11333@riva.ucam.org> <20050317005506.GD18304@archimedes.ucr.edu>
On Wed, Mar 16, 2005 at 04:55:06PM -0800, Don Armstrong wrote:
> On Wed, 16 Mar 2005, Colin Watson wrote:
> > I realise it's a database format change, but I'd really prefer to
> > have the metadata files be pure UTF-8, so that we don't have to
> > process them for display every time, and to make things like
> > searching easier. We can always write a migration script.
>
> I think that's the optimal solution too. However, this patch at least
> will work now, and we can move to pure UTF-8 later.
I've taken the approach of creating a new .summary format version; the
way the .summary file format works means that we can have
"Format-Version: 2" indicate RFC1522 metadata and "Format-Version: 3"
indicate UTF-8 metadata. I haven't yet made format version 3 the
default, but I will do in time.
This made the code a lot simpler, because metadata only needs to be
decoded/encoded in the two functions responsible for reading/writing
.summary files.
I've checked this into CVS, along with some of the uses of
decode_rfc1522() from your patch and the changes to make bugreport.cgi
and pkgreport.cgi output UTF-8, and installed it on bugs.debian.org.
This means that at least maintainer and submitter addresses are now
displayed properly.
The .log metadata and mail character set fixes still need more work; I'm
almost inclined to introduce a new more structured record type to
replace html at the same time, and make that be encoded in UTF-8.
Cheers,
--
Colin Watson [cjwatson@debian.org]
Reply to: