[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[fanliu@pmo.ac.cn: Mailman 邮件归档中gb2312编码的邮件体为乱码]



大家好,

我在管理一个Mailman服务 (Mailman 2.1.9, python 2.4, debian lenny),遇到这样的问题。

有些用户使用,比如foxmail这样的邮件客户端发信,使用gb2312编码。这样的邮件在归档
的时候gb2312编码的邮件体部分是乱码。在浏览器中试用多种编码(utf-8, gb2312 等中文
编码)都不起作用。例子如下:
http://sfig.pmo.ac.cn/pipermail/astrophysics/2008-April/000022.html

在Mailman-users 列表提问的结果贴在下面,基本上排除了apache服务,和浏览器编码的问
题。Mike Sapiro就说把Mailman中文编码在Defaults.py从utf-8改成gb2312,但是还需要把
所有utf-8的templates都重新编码到gb2312。他猜测我最开始的问题: Defaults.py里改了
这行:
#add_language('zh_CN', _('Chinese (China)'),     'utf-8')
add_language('zh_CN', _('Chinese (China)'),     'gb2312')       
注释掉utf-8,加了gb2312

然后重编archive 报错。
>arch astrophysics
#00000 <20080411094420.GG2585@amber.pmo.ac.cn>
Pickling archive state into /var/lib/mailman/archives/private/astrophysics/pipermail.pck
Traceback (most recent call last):
  File "/usr/lib/mailman/bin/arch", line 200, in ?
    main()
  File "/usr/lib/mailman/bin/arch", line 188, in main
    archiver.processUnixMailbox(fp, start, end)
  File "/usr/lib/mailman/Mailman/Archiver/pipermail.py", line 578, in processUnixMailbox
    a = self._makeArticle(m, self.sequence)
  File "/usr/lib/mailman/Mailman/Archiver/HyperArch.py", line 674, in _makeArticle
    mlist=self.maillist)
  File "/usr/lib/mailman/Mailman/Archiver/HyperArch.py", line 320, in __init__
    self.decode_headers()
  File "/usr/lib/mailman/Mailman/Archiver/HyperArch.py", line 415, in decode_headers
    atmark = unicode(_(' at '), Utils.GetCharSet(self._lang))
UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 0-1: illegal multibyte sequence
可能就是因为templates 文件还是utf-8编码造成的arch失败。

我还是不太清楚这个重新编码要怎么作,必须要这么作么?
再者utf-8和gb2312不能共存么?不知道debian-chinses-gb列表是怎么做的?

谢谢

刘帆

>From Mailman-users list:
Hi Mark,

On Thu, Apr 17, 2008 at 10:23 PM, Mark Sapiro <mark@msapiro.net> wrote:

> Fan Liu
> >
> >The resolution suggested to intall cjkcodecs, which is provided by
> python2.4
> >now, and add
> >add_language('zh_CN', _('Chinese (China)'), 'gb2312')
> >in Defaults.py (as referred to an earlier post
> >http://mail.python.org/pipermail/mailman-i18n/2003-September/000976.html
> ).
> >
> >Then I run
> >arch --wipe mylist
> >
> >I encountered such error,
> >
> >"UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 0-1:
> illegal
> >multibyte sequence"
> >
> >Any ideas why?  Thanks in advance.
>
>
> I'm only guessing, but I think it is probably because the
> templates/zh-CN/* templates are still encoded as utf-8. You can't


I cannot find any templates/zh-CN/* in my system. Where and what are they?

>
> change the character encoding for a language without also recoding all
> the templates and the message catalog.
>
> In any case, I question whether the 4+ year old information is even
> applicable in current Mailman. I suspect your original issue has to do
> with messages in the archive being encoded as utf-8 and the web server
> sending a content-type header with a charset other than utf-8. If so,
> this is a web server issue. See my comment at
> <
> http://sourceforge.net/tracker/index.php?func=detail&aid=1942206&group_id=103&atid=100103
> >
> for a bit more.
>
I'm afraid that is not the case since
1. I have
AddDefaultCharset Off
in my apache conf.
2. I tried various Chinese encodings in various browsers (safari, firefox
and even IE), basically, UTF-8 and GB2312 outputs the same, which gives me
the impression that the text has already been ruined :|
here's an example.
http://sfig.pmo.ac.cn/pipermail/astrophysics/2008-April/000006.html
What's strange is that the subjects of the emails are always correct.

Regards,
Fan


> --
> Mark Sapiro <mark@msapiro.net>        The highway is for gamblers,
> San Francisco Bay Area, California    better use your sense - B. Dylan
>
>
>> I'm only guessing, but I think it is probably because the
>> templates/zh-CN/* templates are still encoded as utf-8. You can't
>
>
>I cannot find any templates/zh-CN/* in my system. Where and what are they?


Sorry. That should be templates/zh_CN/*. There is also the message
catalog at messages/zh_CN/LC_MESSAGES/mailman.po.

If you are going to change Mailman's encoding for zh_CN from the
default utf-8 to gb2312, you also have to recode all the templates and
recode the mailman.po message catalog and then rebuild the mailman.mo
file with mailman's bin/msgfmt.py or the standard GNU msgfmt.


>> change the character encoding for a language without also recoding all
>> the templates and the message catalog.
>>
>> In any case, I question whether the 4+ year old information is even
>> applicable in current Mailman. I suspect your original issue has to do
>> with messages in the archive being encoded as utf-8 and the web server
>> sending a content-type header with a charset other than utf-8. If so,
>> this is a web server issue. See my comment at
>> <
>> http://sourceforge.net/tracker/index.php?func=detail&aid=1942206&group_id=103&atid=100103
>> >
>> for a bit more.
>>
>I'm afraid that is not the case since
>1. I have
>AddDefaultCharset Off
>in my apache conf.
>2. I tried various Chinese encodings in various browsers (safari, firefox
>and even IE), basically, UTF-8 and GB2312 outputs the same, which gives me
>the impression that the text has already been ruined :|
>here's an example.
>http://sfig.pmo.ac.cn/pipermail/astrophysics/2008-April/000006.html
>What's strange is that the subjects of the emails are always correct.


That is because the subjects are properly encoded in a properly
identified charset in the original message, and Mailman is then able
to recode them into the charset of the archive. It is apparently not
able to do this with the message body. I can only guess that this
might be because the charset of the message body is not properly
identified in the original mail.

-- 
Mark Sapiro <mark@msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan


-- 
Fan Liu
Ph.D Student of Star Formation in Galaxies Group,
Purple Mountain Observatory,
2# West Beijing Road, Nanjing, China
Email: fanliu@pmo.ac.cn
Homepage: http://sfig.pmo.ac.cn/~fliu/


Reply to: