On Fri, 2006-03-03 08:16:05 +0100, Juergen.Leibner@t-online.de <Juergen.Leibner@t-online.de> wrote: > -----Original Message----- > > Date: Fri, 3 Mar 2006 07:48:58 +0100 > > Subject: Umlaut problems in filenames when going from samba2 to samba3 > > From: Klaus Ade Johnstad > > To: user@skolelinux.de > > My problems is that the filenames do not have the German umlauts (öäü) > > or the Norwegian special characthers (øæå). I have about 8000 such > > files, the teachers say that they have square signs, underscores and > > other "strange" stuff instead of umlauts and specially Norwegian > > characters. Yeah, the filesystem uses one representation (eg. UTF-8) while Samba interprets it as another (eg. ISO-8859-something). > I've running samba on a debian system here at work. > Windows and Linux are using the same files. > Samba is configured: > # unix charset = UTF-8 > # display charset = UTF-8 > > debian is configured: > LANG=de_DE.UTF-8@euro > LC_CTYPE="de_DE.UTF-8@euro" > LC_NUMERIC="de_DE.UTF-8@euro" > LC_TIME="de_DE.UTF-8@euro" > LC_COLLATE="de_DE.UTF-8@euro" > LC_MONETARY="de_DE.UTF-8@euro" > LC_MESSAGES="de_DE.UTF-8@euro" > LC_PAPER="de_DE.UTF-8@euro" > LC_NAME="de_DE.UTF-8@euro" > LC_ADDRESS="de_DE.UTF-8@euro" > LC_TELEPHONE="de_DE.UTF-8@euro" > LC_MEASUREMENT="de_DE.UTF-8@euro" > LC_IDENTIFICATION="de_DE.UTF-8@euro" > LC_ALL= That's a well-working configuration. It'll just work for anybody, allowing any kind of Umlauts. Even if some pupil tries to give his russian homework a cyrillic filename. > -rwxrwx---+ 1 root domänen-benutzer 0 2006-03-03 07:51 ÜÄÖßüäö.txt Pah! > > 1. What should I use in smb.conf for the values > > unix charset = > > DOS charset = utf-8 for unix; the DOS charset isn't all that important anymore, since it can only handle one-byte encodings. Maybe cp850 or something like that is a good choice, but newer windows variants shouldn't use that anymore. > > 2. What should actually the LOCALES be? > > > > 3. I've found a program that supposedly will help me, > > http://j3e.de/linux/convmv/ > > I've tried different combinations of > > convmv -f cp850 -t iso8859-1 > > convmv -f cp850 -t utf8 > > But even if the umlauts are again visible from linux, they look > > strange on windows. Anyone having used this program before? I haven't used these, but wrote little shell scripts and ran a 'find' command back in those days. Most important is that you've got a real plan what to convert from which originating encoding to a equally named target encoding. So first decide on locale settings (I'd choose some UTF-8 encoding these days). Then create a filename containing some Umlauts (eg. cut'n'paste from the UTF-8 test files containing lots of Umlauts:-) and look at it, byte-by-byte, eg with ls | xxd Verify that the hex dump contains the correct sequence for the choosen encoding. Then continue with configuring Samba. UTF-8 for Unix charset, something for the DOS charset (as I wrote, there's probably no client using this anymore, if you don't insist in using DOS/Lanman or things like that.) Then go to a Windows box and create a filename containing Umlauts. It should look correct afterwards on the creating Windows box as well as on a different one. Then go back to the Linux box and verify that the filename reads okay. (If not, Samba hasn't taken the new configury yet...) You'd better do these things *fast*. You don't want the guys to create new files in this time, because you'd end up with a not-so-nice mix of differently encoded filenames! Finally, fix the pre-existing filenames. > > Oh, another problem is that they are using this system very heavily 24 > > hours a day (lots of vpn connections), so I can't just restart Samba > > whenever I like to... > > It should IMHO not be necassary to restart samba. You'd need to restart the sessions, but that's not much of a problem either: It is normal behavior that a Samba server (instance) quits after some time of inactivity; the client will reinstate the connection on it's own when getting busy again. That's actually a nice thing: Just kill all the fork()ed Samba clients (letting the parent survive!) so all clients will claim a fresh connection, with a fresh server reading the new config file:-) > I think your scenario is similar to the configuration I descrobed above. > But IMHO to convert old files coming in from an old samba version to the > actual version of samba, only changing settings in the smb.conf wont work. > I think you have to do both. First configure your system and samba for > propper work with all newly created files and folders and then put the old > data on the shares and convert the files to your needs. ACK. You need to convert the filenames. In a company I worked for, we even had the nice thing that the filenames containing (now improperly encoded) Umlauts weren't visible any longer (from the Windows clients). For a start, something like this should to the recoding: ----------- recoding-script.sh ------------- #!/usr/bin/env sh SRC_ENCODING=ISO-8859-1 DEST_ENCODING=UTF-8 FILENAME="${1}" NEW_NAME="`echo "${FILENAME}" | iconv --from-code="${SRC_ENCODING}" --to-code="${DEST_ENCODING}"`" mv -- "${FILENAME}" "${NEW_NAME}" --------------------------------------------- ...and then call it on all the names: find /path/to/share -exec /path/to/recoding-script.sh {} \; Notice that SRC_ENCODING is the encoding that was written by the Samba server (prior to charset reconfiguration), so you'd check that using the xxd trick with some encoding tables. DEST_ENCODING is the Linux encoding you'd like to use afterwards, which you'll also need to configure in Samba. MfG, JBG -- Jan-Benedict Glaw jbglaw@lug-owl.de . +49-172-7608481 _ O _ "Eine Freie Meinung in einem Freien Kopf | Gegen Zensur | Gegen Krieg _ _ O für einen Freien Staat voll Freier Bürger" | im Internet! | im Irak! O O O ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));
Attachment:
signature.asc
Description: Digital signature