fixing international characters with samba and apache
Living in Sweden I have finally configured my Debian server to
handle the swedish alphabet (with å, ä, ö) correctly. Below is
how I did, for anyone who's interested.
PROBLEM 1: Filenames on a Samba share with correct å/ä/ö in
Windows do not display correctly in Debian.
Solution: Switch to UTF-8 encoding in Debian (more and more
tools use this as default for filenames including f ex Gnome
Set UTF-8 by assigning LANG to a UTF-8-based locale for all
processes, f ex:
If you want to keep English language in Debian (not switching
to your local language) also add this:
(If needed, create /etc/environment and source it from
If a suitable UTF-8 locale is not available on your system then
add it. Check available locales:
# locale -a
Add locale to "gen" file:
PROBLEM 2: Apache's automatic file listings (autoindex) display
Solution: Let Apache use UTF-8 as default encoding.
Set default encoding:
PROBLEM 3: Some clients send request URLs incompatible with
UTF-8. An interesting (and confusing) example is the combination
of Internet Explorer (IE6) and Adobe Reader when opening a pdf
file. First, Apache receives a GET requests with a correctly
formed UTF-8 URL, but after that there is a GET request with
binary (not URL-encoded!) 8-bit characters according to
ISO8859-1 encoding. The latter request of course fails.
Solution: Use Apache mod_rewrite to convert illegal characters
to valid URL-encoded UTF-8 (which is the convention to use for
# cd /etc/apache2/mods-enabled/
# ln -s ../mods-available/rewrite.load
For lowercase å/ä/ö add these rewrites:
RewriteRule (.*)å(.*) $1%C3%A5$2
RewriteRule (.*)ä(.*) $1%C3%A4$2
RewriteRule (.*)ö(.*) $1%C3%B6$2
(for some reason I haven't been able to get uppercase Å/Ä/Ö
Make sure that å/ä/ö in the rules are saved in the ISO8859-1
encoding as this needs to match exactly what arrives in the
request. You can check this with octal dump:
# od -t c /etc/apache2/httpd.conf:
0001120 R u l e ( . * ) 345 ( . * ) $
0001140 1 % C 3 % A 5 $ 2 \n R e w r i t
0001160 e R u l e ( . * ) 344 ( . * )
0001200 $ 1 % C 3 % A 4 $ 2 \n R e w r i
0001220 t e R u l e ( . * ) 366 ( . * )
(notice the "eight-bit" characters 345/344/366)
Good luck with your own configuration!
View this message in context: http://www.nabble.com/fixing-international-characters-with-samba-and-apache-tf3870912.html#a10966956
Sent from the Debian User mailing list archive at Nabble.com.