[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: html2text with utf8 support: please test



Samuel Thibault wrote:
> Eugene V. Lyubimkin wrote:
>> Utility html2text, version 1.3.2a-6, with "utf8" patch was just
>> uploaded to experimental.  The patch allows to process UTF-8 files
>> when '-utf8' option supplied. Input should be in UTF-8 and output will
>> be in UTF-8 too.
>>
>> Please test this functionality - I believe that UTF-8 support is a
>> good feature, especially for processing non-English documents.
> 
> Mmm, the way it is done looks wrong to me: there is no reason why the
> input and output charsets should be related at all.  For the input,
> html2text should recognize the meta http-equiv tag, that should work
> for a lot of pages, else an input-charset option can be provided.  For
> the output, the current locale's charset should be used (as returned by
> nl_langinfo(CODESET) after calling setlocale(LC_CTYPE,"")), that should
> work in almost all cases, else an output-charset option can be provided.
> 
> Yes, that means conversions.  But without that you can not put a sticker
> "utf-8 support", only "limited utf-8 support".
> 
> Samuel
> 
Ok, this would be good. You are welcome to file minor/wishlist bug, and I will ask author
to think on it. The author is not very active in html2text development, though.

-- 
Eugene V. Lyubimkin aka JackYF, Ukrainian C++ developer.

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: