[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#344304: qa.debian.org: no charset specified when browsing news



> The new mhonarc config should do charset conversion if possible,
> or just output the text as-is in the case charset of the mail is utf8
> or unknown.

It's not that simple. Leaving a '<' character can cause security
issues. Anyway, the relevant portion of the mhonarc manual is the
<CHARSETCONVERTERS> resource. Take a look at mhonac::htmlize
versus MHonArc::CharEnt::str2sgml and possibly discuss this on the
upstream mailing list.

http://www.mhonarc.org/MHonArc/doc/resources/charsetconverters.html

However, I personally recommend that mhonarc be set to convert
everything to UTF-8, no exceptions. That simplifies a lot of things,
including the use of mixed languages in a single message. Mixed
language index pages. Easier linguistic analysis and data mining
of the HTML.  Etc.  Bending over backwards for incorrectly labelled
character sets on inbound email seems more trouble than it is worth.

Incidentally, I was probably put on CC: because I'm the mhonarc package
maintainer. But I should also mention that one of my other hats is
helping run mail-archive.com, which provides secondary archival service
for all Debian mailing lists, with permission of the (former) DPL. The service
is also available for any other Debian team or group currently wrestling
with mhonarc configuration. So that is a possible fallback if needed.

Cheers,
Jeff



Reply to: