[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#500540: kdebase: automounting vfat (partialy) case sensitive due to utf8 is plain wrong and dangerous



Heinrich Langos <henrik-debian-bug@prak.org> writes:

> Now lets try again with more sane vfat options:
>
> # mount | grep vfat
> /dev/sda1 on /mnt type vfat (rw,nosuid,nodev,noatime,uhelper=hal,flush,uid=1000,shortname=lower,check=relaxed,codepage=850,iocharset=iso8859-1)

> As you see it is not perfect as the "TEST" file gets only created as
> "test". My guess is that "shortname=mixed" instead of
> "shortname=lower" should be used but don't take my word for it.

"shortname=mixed" works nicely with "utf8" flag, and command

    touch test Test teSt

touches the same file three times.

> And who came up with the idea to mount vfat with utf8 anyway? It was
> never designed to take short utf8 names. Those are strictly 8.3 and if
> you try to stick utf8 characters in there, you get all kinds of length
> checking problems. Long names on vfat are stored in unicode anyway. So
> whats the big gain here? For the sake of squeezing utf8 into places
> where it never was ment to be, we get messed up filesystems?

I admit that some of the ideas may have come from me. I have described
one aspect of this issue in the kernel bug #417324:

    http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=417324

I'm pretty confused with all these "iocharset", "codepage" and "utf8"
flags but I'm certain that in Debian Etch (and its default locale
settings [UTF-8] and kernel settings) filenames get converted totally
wrong.

Long filenames in FAT filesystem are in the form we call UTF-16 today.
In default Etch system FAT's UTF-16 filenames get converted to
ISO-8859-1 if the filesystem is not mounted with "utf8" flag. The other
direction is so that Etch's UTF-8 filenames are assumed to be in
ISO-8859-1 and, since it's a single-byte encoding, every byte (even in a
UTF-8 multibyte character) gets converted separately to UTF-16. This
produces complete garbage of course.
 
KDE is nice enough to use "utf8" flag but someone reported that Gnome
does not (or at least did not) mount with this flag. Thus it produces
filenames which are unreadable in other systems (including MS Windows).

I guess the change in kernel settings made you see this issue after
upgrading from Etch to Lenny. The option CONFIG_FAT_DEFAULT_IOCHARSET
was changed from "iso-8859-1" to "utf8".

> As far as I have seen in archives and related bug reports the blame
> for this problem gets shifted around from KDE to pmount to the kernel
> itself and all the way back. Everybody happily points fingers at the
> others.

This seems to be pretty complicated. We have to make

    VFAT/UTF-16  <-->  Debian/UTF-8

conversion work somehow, and in Etch it does not work (except when KDE
does the mounting). In Lenny the conversion currently works by default
with just "mount" without any options; this is because the change in the
kernel settings.

But then there's the issue you reported... :-( In my experience
"shortname=mixed" works nicely without character case problems.

> -henrik
> (Using Debian since buzz.)

Wow, I'm from the Woody/Sarge generation. :-)



Reply to: