Re: html2text with utf8 support: please test
Eugene V. Lyubimkin wrote:
> Utility html2text, version 1.3.2a-6, with "utf8" patch was just
> uploaded to experimental. The patch allows to process UTF-8 files
> when '-utf8' option supplied. Input should be in UTF-8 and output will
> be in UTF-8 too.
> Please test this functionality - I believe that UTF-8 support is a
> good feature, especially for processing non-English documents.
Mmm, the way it is done looks wrong to me: there is no reason why the
input and output charsets should be related at all. For the input,
html2text should recognize the meta http-equiv tag, that should work
for a lot of pages, else an input-charset option can be provided. For
the output, the current locale's charset should be used (as returned by
nl_langinfo(CODESET) after calling setlocale(LC_CTYPE,"")), that should
work in almost all cases, else an output-charset option can be provided.
Yes, that means conversions. But without that you can not put a sticker
"utf-8 support", only "limited utf-8 support".