[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#461159: [Pkg-fonts-devel] Bug#461022: description doesn't render in aptitude



On Thu, Jan 17, 2008 at 10:21:11AM +0100, Tomas Pospisek <tpo@sourcepole.ch> was heard to say:
> On Thu, 17 Jan 2008, Christian Perrier wrote:
>
>> Quoting Tomas Pospisek (tpo@sourcepole.ch):
>>
>>>> The file *is* UTF-8 from what I see.
>>>
>>> I'm looking at it through konsole, which runs bash. Konsole's encoding
>>> is set to "Default". If I set it to UTF8, it still doesn't render.
>>
>> It does, in the exact same conditions on my system.
>
> I did this:
>
> $ apt-cache show ttf-ecolier-lignes-court > /tmp/k
> $ vim /tmp/k
>
> and clearly, the problem here is *not* the displaying/decoding/the fonts. 
> The problem is apt-cache, since if I look at the produced output in 
> /tmp/k it's still cut off at:
>
>  "Description: cursive roman font (with r"

  I can confirm that pkgRecords::Parser::ShortDesc() returns a truncated
string for this Description if I run it with LC_ALL=C.

  It looks like apt's description extraction routine attempts to
transcode it from UTF-8 to the current locale without paying
attention to error conditions.  As a result, the string gets
truncated at the first character that can't be translated.

  This is a bit odd since iconv(3) and the glibc docs say that iconv
stops when the output buffer is full or when an invalid or incomplete
character is encountered in the input buffer.  Untranslatable characters
aren't mentioned.

  If I instrument UTF8ToCodeset to save the return value of iconv and
the value of errno afterwards (neither of which gdb lets me do, grr) I
can see that iconv returns -1 and sets errno to EILSEQ, invalid byte
sequence.  This seems wrong, since the source codeset is hardcoded to
UTF-8, but maybe iconv is just reporting that as the closest analogue to
"I couldn't represent this code point in the output encoding".  Anyway,
the result of all this is that you get a truncated string.

  I'd suggest taking a look at what aptitude does to handle errors
(basically inserting "?" characters at locations that can't be
converted).  Actually, I'd even go so far as to suggest that apt should
just return all strings in UTF-8 rather than trying to be clever and
guess what the client code wants, but it's probably way too late for
that :-/.

  Daniel



Reply to: