[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

fixing international characters with samba and apache

Living in Sweden I have finally configured my Debian server to 
handle the swedish alphabet (with å, ä, ö) correctly. Below is 
how I did, for anyone who's interested.


PROBLEM 1: Filenames on a Samba share with correct å/ä/ö in 
Windows do not display correctly in Debian.

Solution: Switch to UTF-8 encoding in Debian (more and more 
tools use this as default for filenames including f ex Gnome 

Set UTF-8 by assigning LANG to a UTF-8-based locale for all 
processes, f ex:
        export LANG=sv_SE.utf8

If you want to keep English language in Debian (not switching 
to your local language) also add this:
        export LC_MESSAGES=POSIX

(If needed, create /etc/environment and source it from 

If a suitable UTF-8 locale is not available on your system then 
add it. Check available locales:

# locale -a

Add locale to "gen" file:
        sv_SE UTF-8

Then run:

# locale-gen.


PROBLEM 2: Apache's automatic file listings (autoindex) display 
å/ä/ö incorrectly.

Solution: Let Apache use UTF-8 as default encoding.

Set default encoding:
        AddDefaultCharset UTF-8


PROBLEM 3: Some clients send request URLs incompatible with 
UTF-8. An interesting (and confusing) example is the combination
of Internet Explorer (IE6) and Adobe Reader when opening a pdf
file. First, Apache receives a GET requests with a correctly
formed UTF-8 URL, but after that there is a GET request with
binary (not URL-encoded!) 8-bit characters according to 
ISO8859-1 encoding. The latter request of course fails.

Solution: Use Apache mod_rewrite to convert illegal characters 
to valid URL-encoded UTF-8 (which is the convention to use for 

Enable mod_rewrite:

# cd /etc/apache2/mods-enabled/
# ln -s ../mods-available/rewrite.load

For lowercase å/ä/ö add these rewrites:
        RewriteEngine On
        RewriteRule (.*)å(.*) $1%C3%A5$2
        RewriteRule (.*)ä(.*) $1%C3%A4$2
        RewriteRule (.*)ö(.*) $1%C3%B6$2
(for some reason I haven't been able to get uppercase Å/Ä/Ö

Make sure that å/ä/ö in the rules are saved in the ISO8859-1 
encoding as this needs to match exactly what arrives in the 
request. You can check this with octal dump:

# od -t c /etc/apache2/httpd.conf:
0001120   R   u   l   e       (   .   *   ) 345   (   .   *   )       $
0001140   1   %   C   3   %   A   5   $   2  \n   R   e   w   r   i   t
0001160   e   R   u   l   e       (   .   *   ) 344   (   .   *   )
0001200   $   1   %   C   3   %   A   4   $   2  \n   R   e   w   r   i
0001220   t   e   R   u   l   e       (   .   *   ) 366   (   .   *   )

(notice the "eight-bit" characters 345/344/366)


Good luck with your own configuration!
Mike Wilson
View this message in context: http://www.nabble.com/fixing-international-characters-with-samba-and-apache-tf3870912.html#a10966956
Sent from the Debian User mailing list archive at Nabble.com.

Reply to: