Bug#612675: libkio5: KTar class have broken UTF-8 support(longlink)

To: Ibragimov Rinat <ibragimovrinat@mail.ru>, 612675@bugs.debian.org
Cc: Gerfried Fuchs <rhonda@deb.at>
Subject: Bug#612675: libkio5: KTar class have broken UTF-8 support(longlink)
From: Modestas Vainius <modax@debian.org>
Date: Mon, 9 May 2011 01:27:28 +0300
Message-id: <[🔎] 201105090127.28962.modax@debian.org>
Reply-to: Modestas Vainius <modax@debian.org>, 612675@bugs.debian.org
In-reply-to: <[🔎] E1QHXdT-0002DQ-00.ibragimovrinat-mail-ru@f278.mail.ru>
References: <20110209221622.7254.23363.reportbug@acerone2> <[🔎] 20110504071200.GA1899@anguilla.debian.or.at> <[🔎] E1QHXdT-0002DQ-00.ibragimovrinat-mail-ru@f278.mail.ru>

Hello,

On trečiadienis 04 Gegužė 2011 11:40:43 Ibragimov Rinat wrote:
> > This though is not totally clear to me. On the major architectures,
> > char is signed, so I would assume that a chksum error in this area
> > should have hit a lot of people already? Given that int is signed by
> > default I wonder if this is the proper approach and it shouldn't rather
> > be cast to signed char (signedness of char varies across the different
> > architectures).
> 
> The error only occurs when file name have characters with codes larger than
> 128. All ASCII have codes lower than 127, so in that case there is no
> difference. UTF-8 uses most significant bit as flag, so some charactes have
> codes larger than 128. I'll explain with example:
> 
> int check = 32;
> check += buffer[j];
> 
> assume buffer[0]==128, i.e. 0x80. When one adds signed char 0x80 to an
> integer, signed char extents to a signed integer and becomes 0xffffff80.
> It is not 0x80, as one may expect.
> 
> But if all file names are in english, no one can face the bug.
> 
> > Out of curiosity, you filed this from an i386 system. Did you maybe
> > copy around the backup from/to any architcture including arm, armel,
> > powerpc or s390? Were they somehow involved in the assumingly checksum
> > error of yours? The thing behind the question is: If we "fix" the
> > calculation in the direction that you propose, this would break backups
> > done now on the architectures that do have char signed by default
> > because it would result in a different checksum.
> 
> No, unfortunately I don't have access to architectures other than amd64 and
> i386.

What I'm concerned about is that your patch may not be complete. There are 
more similar "checks" in ktar.cpp. As I absolutely have no idea how tar works, 
this will take time to handle properly (or hopefully upstream responds in the 
meantime). Thanks for forwarding the bug.

-- 
Modestas Vainius <modax@debian.org>

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply to:

Follow-Ups:
- Bug#612675: libkio5: KTar class have broken UTF-8 support(longlink)
  - From: Ibragimov Rinat <ibragimovrinat@mail.ru>

References:
- Bug#612675: libkio5: KTar class have broken UTF-8 support (longlink)
  - From: Gerfried Fuchs <rhonda@deb.at>
- Bug#612675: libkio5: KTar class have broken UTF-8 support(longlink)
  - From: Ibragimov Rinat <ibragimovrinat@mail.ru>

Prev by Date: akonadi_1.5.3-1_amd64.changes ACCEPTED into experimental
Next by Date: Bug#612675: libkio5: KTar class have broken UTF-8 support(longlink)
Previous by thread: Bug#612675: libkio5: KTar class have broken UTF-8 support(longlink)
Next by thread: Bug#612675: libkio5: KTar class have broken UTF-8 support(longlink)
Index(es):
- Date
- Thread