[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: 8bit characters in files in Debian packages



On Thu, Mar 31, 2011 at 07:05:10PM +0200, Raphael Hertzog wrote:
> On Thu, 31 Mar 2011, Bill Allombert wrote:
> > So this raises two issues:
> > 1) should non-7bit characters in filenames be allowed
> 
> Yes, I don't see a good reason to forbid them. In particular when we are
> in an international environment and we are targetting full localization.
> 
> > 2) if yes whould we require the filename to be in a correct UTF-8 encoding ?
> 
> I think it would be good, yes. We have standardized on UTF-8 for almost
> everything and we should do the same for filenames.

I am not sure this is such a good idea.
First this might force users to use UTF-8 locale. While this is the default, this is not
mandatory in Debian. I know users that stays with ISO8859-1 because they have a lot of
text files in that encoding.

Until the C.UTF-8 proposal is implemented and mandated, a valid UTF-8 locale might not even
exist on the system.

Secondly, filenames inside .deb are not localizable, and it might prove problematic for users to 
deal with filenames in complex encoding. Case at end, I do not have Japanese font installed
so I could not tell apart two filenames.

To give an example, dvb-apps include town names in filename, like

/usr/share/dvb/dvb-t/fr-Albi
/usr/share/dvb/dvb-t/fr-Alençon
/usr/share/dvb/dvb-t/fr-Ales

 While it could appear to make sense to use the correct encoding for the town
name, this is not done consistently: a lot of town names are mangled, so
mangling two more should not make any difference.
(Alès is mangled as Ales, Saint-Étienne is mangled as SaintEtienne,  Aθήνα as Athens, 臺北 as 
Taipei, etc.)

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here. 


Reply to: