[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#535702: Faulty encoding/decoding of filenames



Package: dolphin
Version: 4:4.2.2-1

I have a disk mounted that uses a few ISO8859-1-encoded filenames while the 
rest of my system is using UTF-8. Dolphin fails to handle files that are not 
ASCII (the common subset) in such a setup.

Two cases happened to me:
1. I can't browse into a directory with umlauts. Dolphin displays the dir with 
a questionmark in place of the umlaut but doesn't allow you to click on it in 
order to browse into it.
2. I can't even rename the directory. Probably just a different aspect of the 
same problem, Dolphin complains that the file can't be found.

The weird part is that the file it claims it can't find has very little in 
common with the one on the disk.
Example: Mission_erfüllt.ogg is the file on disk, encoded with ISO8859-1 that 
makes it "Mission_erf\xfcllt.ogg". Now, when I try to rename the file, Dolphin 
claims it can't find "Mission_erf�llt.ogg", which would be 
"Mission_erf\xef\xbf\xbdllt.ogg". If I decode these three bytes according to 
UTF-8, they form the codepoint ufffd, which is a "replacement character"[1], 
probably inserted because the filename couldn't be decoded according to the 
current locale. What must be done is to preserve the bytewise representation 
of the filename. In order to display it, it can try to transcode it and do 
replacements there, but for accessing the name, e.g. for renaming, it must not 
use a filename resulting from this lossing encoding roundtrip.

Please don't suggest to me that I should fix my locale, mount the disk with a 
different encoding or similar things. Those are are good ideas (if it wasn't 
for pluggable media) but no excuses for Dolphin performing lossy roundtrip 
conversions on data it doesn't understand.

Uli



[1] Quoting from http://www.unicode.org/charts/PDF/UFFF0.pdf:
FFFD REPLACEMENT CHARACTER
     * used to replace an incoming character whose value
       is unknown or unrepresentable in Unicode




Reply to: