[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [gopher] GopherMole - a gopher media crawler



On 01/03/2015 11:46 AM, James Mills wrote:
Or does a Client query a Gopherd for CAPS
and if it sees "Encoding: utf-8" assumes *all*
content it receives from *that* Gopherd is
encoded in UTF-8?

That's what I was suggesting, yes.

One could argue that a single server might contain a plethora of documents, each of which would be encoded in a specific charset, and that's certainly a possibility. But in practice, I have always seen servers saying (in human language, mainly on their root page) "this server is serving content in utf-8", and rarely or never "this specific document is encoded in xyz".

But still, the CAPS capability I was suggesting was about a "default" encoding, that is, "if not specified otherwise, assume everything on this server is encoded in this encoding". That way, if one day there is a mechanism that allows to specify the charset on a per-document basis, both won't collide (although I doubt such specific mechanism will appear, but of course one can never be sure of the future).

Currently, gopher clients are supposed to assume ISO Latin 1, as per RFC 1436. The ServerDefaultCharset CAPS setting I was suggesting in my message from 31st of December, 2014, was simply a way to overload that RFC charset.

Mateusz





On Sat, Jan 3, 2015 at 8:38 PM, Mateusz Viste <mateusz@viste.fr
<mailto:mateusz@viste.fr>> wrote:

    On 01/03/2015 11:27 AM, James Mills wrote:

        Mis-rendered correct (which is what I meant)
        but the client "won't break".


    That's correct.

        What's what I meant by "degrade".


    Sure, but that's hardly 'graceful'. And doesn't have anything to do
    with ISO-8859-1. Which doesn't mean I am opposed to UTF-8 usage in
    the gopherspace, on the contrary, I'm 100% for it. But it's
    important to keep in mind the exact impact it will have on legacy
    clients.

        *I think* a Gopher server that splits out UTF_8 encoded data to
        a Client
        that doesn't support UTF-8 encoding will still display the
        content (just
        not any codepoint higher than 255)?


    Only low-ascii will be rendered correctly, that is anything above
    code point 127 will be scrambled.

    Here's an example:

    gopher://gopher.viste.fr/0/__docs/other/Little%2520Big%__2520Adventure%2520-%__2520Soluce%2520du%2520jeu%__2520%2528french%2529.txt
    <http://gopher.viste.fr/0/docs/other/Little%2520Big%2520Adventure%2520-%2520Soluce%2520du%2520jeu%2520%2528french%2529.txt>

    Same thing here (but on a polish document):

    gopher://gopher.viste.fr/0/__docs/opowiadania%2520%__2528polish%2529/sendbajt.txt
    <http://gopher.viste.fr/0/docs/opowiadania%2520%2528polish%2529/sendbajt.txt>

    When I open these documents with Overbite, all french or polish
    diacritics are broken (until I set my browser manually to UTF-8).

    Of course there are thousands of such examples across the gopherspace.

    Mateusz


    _________________________________________________
    Gopher-Project mailing list
    Gopher-Project@lists.alioth.__debian.org
    <mailto:Gopher-Project@lists.alioth.debian.org>
    http://lists.alioth.debian.__org/cgi-bin/mailman/listinfo/__gopher-project
    <http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/gopher-project>




_______________________________________________
Gopher-Project mailing list
Gopher-Project@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/gopher-project


_______________________________________________
Gopher-Project mailing list
Gopher-Project@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/gopher-project




Reply to: