Re: content negotiation for language in web pages
On Thu, 29 Jan 1998, James A.Treacy wrote:
> After looking into this some more, it looks like we have a few options
> in organizing translated web pages.
> 1. Have a separate directory for each language. The main pages would
> have cross-links to the other languages.
This is the wide-spread solution, and it's ugly!!!
> 2. Put all the pages together, but append the language to each file,
> e.g. about.html.en, about.html.de, etc. Each page references
> just the base page name (about.html) and we rely on content negotiation
> to let the server decide which language should be served.
The only drawback this has is indexing. It's a pain to index things like
this. I used to use glimpse to index a server that uses this, and I had
two options: 1) modify glimpe and make it content-language aware (not
done) or 2) fool glimpse using a method that servers a second purpose.
Once upon a time I did 2), I don't have the scripts anymore, and it's sad,
because they could have worked great in this case. I remember, though. Let
me explain. I had:
some_file.html -> ../../[...]/some_file.html.lang-1
In this way, glimpse could create indexes for every language. Obviously
this imposes a problem: how to build the .lang dirs... the script that
builds that has to understand how content negotiation works, i.e., if
there's no file .lang-1 fall back to .lang-2 and so on.
> This works great as long as the server supports content negotiation.
Ooh, yes, I remember, there's this other thing that's not Apache... ;-)
> Mirrors that don't support content negotiation would be stuck serving
> in one language (the pages would be set up to default to English).
Not necessarily <sp> true. I mean, the English part. For example, if the
German mirror doesn't support negotiation, under the previous scheme, it
can only mirror the German directories. (Flattening symlinks, of course)
> It has the benefit of supporting partial translations. If a
Yes, that's why I used this in the first place. It works great.
> Also, if a browser doesn't know about content negotiation or the user
> hasn't configured it to use their preferred language (and the default
> is usually English), the user will get English docs.
This again may not be always true. If the browser doesn't support content
negotiation, it has an internal list (at least Apache does). It knows what
language to serve by default.
> 3. Similar to 2, but each language references the pages in its language,
> e.g. index.html.de would reference vendors.html.de . At the main
ugly, ugly, ugly. It's a nightmare to maintain. Plus, the server has to be
reconfigured to understand that html.en is text/html, and that is not
always possible because of the "extra" dot.
> page the user would get a language (either by content negotiation
> or by explicitly choosing the language by using one of the cross-links)
> and all links followed after that would be in that language.
> Someone jumping into a different page would have no idea other languages
With the setup I presented, this can be solved in this way:
http://www.debian.org/lang-1 reads DocumentRoot.lang-1 and it DOESN'T do
content negotiation. The other languages are treated in the same way.
http://www.debian.org/ reads DocumentRoot and it DOES content negotiation.
Drawback: you have to remember to use relative links only, that is, <A
HREF="/dir/document.html"> is not allowed. You have to <A
HREF="../../dir/document.html>. This almost always limits the usefulness
of server generated footer and headers that contain links.
> Bad points: once you choose a language, you are stuck with it unless you
> go back to the main page (or are clever and type in a language extension).
> Partial translations aren't dealt with well. A German with good French
> (does such a person exist? ;) and poor English isn't served well by this
> model with a partial German translation.
A user that recognizes the usefulness of content-negotiation is the one
that understands several languages. People speaking just one language most
of the time doesn't even know about the existance of content-negotiation.
They go looking for stupid "click here for esperanto"'s
> Personally, I see number 3 as the way to go. Of course, other opinions/
> additional ideas are welcome.
I really think content-negotiation is the way to go, considering that it's
something that can be configured on a server by server basis. For example,
www.es.debian.org (the mirror in Spain, not the server in Spanish that
someone else proposed) can be configured to provide documents in Spanish
by default. www.it.debian.org provides documents in Italian,
www.us.debian.org in English, and so on.
The problem I saw, and still see, is search engines are stupid enough not
to know about content-negotiation (well, I complained, and someone at
Altavista emailed me saying they were consireding that, maybe they have
implemented it by now). For example, http://www.debian.org/ may appear in
search engines only in English, but when the user gets there it suddenly
starts speaking German (because the browser asks for "de fr en", for
example). For me, that's really nice, but others may not think so. That's
the other reason I came up with the DocumentRoot.lang thing.
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
Trouble? e-mail to firstname.lastname@example.org .