[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

fixing international characters with samba and apache



Living in Sweden I have finally configured my Debian server to 
handle the swedish alphabet (with å, ä, ö) correctly. Below is 
how I did, for anyone who's interested.

***

PROBLEM 1: Filenames on a Samba share with correct å/ä/ö in 
Windows do not display correctly in Debian.

Solution: Switch to UTF-8 encoding in Debian (more and more 
tools use this as default for filenames including f ex Gnome 
Nautilus).

Set UTF-8 by assigning LANG to a UTF-8-based locale for all 
processes, f ex:
    /etc/environment:
        export LANG=sv_SE.utf8

If you want to keep English language in Debian (not switching 
to your local language) also add this:
        export LC_MESSAGES=POSIX

(If needed, create /etc/environment and source it from 
/etc/profile.)

If a suitable UTF-8 locale is not available on your system then 
add it. Check available locales:

# locale -a

Add locale to "gen" file:
    /etc/locale.gen:
        ...
        sv_SE UTF-8

Then run:

# locale-gen.

***

PROBLEM 2: Apache's automatic file listings (autoindex) display 
å/ä/ö incorrectly.

Solution: Let Apache use UTF-8 as default encoding.

Set default encoding:
    /etc/apache2/apache2.conf:
        ...
        AddDefaultCharset UTF-8

***

PROBLEM 3: Some clients send request URLs incompatible with 
UTF-8. An interesting (and confusing) example is the combination
of Internet Explorer (IE6) and Adobe Reader when opening a pdf
file. First, Apache receives a GET requests with a correctly
formed UTF-8 URL, but after that there is a GET request with
binary (not URL-encoded!) 8-bit characters according to 
ISO8859-1 encoding. The latter request of course fails.

Solution: Use Apache mod_rewrite to convert illegal characters 
to valid URL-encoded UTF-8 (which is the convention to use for 
URLs).

Enable mod_rewrite:

# cd /etc/apache2/mods-enabled/
# ln -s ../mods-available/rewrite.load

For lowercase å/ä/ö add these rewrites:
    /etc/apache2/httpd.conf:
        RewriteEngine On
        RewriteRule (.*)å(.*) $1%C3%A5$2
        RewriteRule (.*)ä(.*) $1%C3%A4$2
        RewriteRule (.*)ö(.*) $1%C3%B6$2
(for some reason I haven't been able to get uppercase Å/Ä/Ö
working...)

Make sure that å/ä/ö in the rules are saved in the ISO8859-1 
encoding as this needs to match exactly what arrives in the 
request. You can check this with octal dump:

# od -t c /etc/apache2/httpd.conf:
...
0001120   R   u   l   e       (   .   *   ) 345   (   .   *   )       $
0001140   1   %   C   3   %   A   5   $   2  \n   R   e   w   r   i   t
0001160   e   R   u   l   e       (   .   *   ) 344   (   .   *   )
0001200   $   1   %   C   3   %   A   4   $   2  \n   R   e   w   r   i
0001220   t   e   R   u   l   e       (   .   *   ) 366   (   .   *   )

(notice the "eight-bit" characters 345/344/366)

***

Good luck with your own configuration!
Mike Wilson
-- 
View this message in context: http://www.nabble.com/fixing-international-characters-with-samba-and-apache-tf3870912.html#a10966956
Sent from the Debian User mailing list archive at Nabble.com.



Reply to: