[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [gopher] CAPS capability: ServerDefaultCharset



Hello,

For those interested in unicode and utf-8 handling, I developed not so long ago a converter that decodes UTF-8 content and encodes it into several 8-bit codepages (can also work the other way). The source code is pretty readable, and it's available here:

http://sourceforge.net/p/utf8tocp/code/HEAD/tree/utf8tocp.c

It comes with several lookup tables already. I developed it primarily for the FreeDOS localization project.

The utf8tocp project's main page is this:

http://sourceforge.net/projects/utf8tocp/

Mateusz




On 01/03/2015 06:18 PM, Nuno Silva wrote:
On 2015-01-03 17:55, Kim Holviala wrote:
On 03 Jan 2015, at 17:49, Nuno Silva <nunojsilva@ist.utl.pt> wrote:

You mean Gophernicus can even handle both ISO-8859-1 and UTF-8 if
they're mixed inside the *same* document? That's neat! (And it also
degrades in a nice way!)

Yep, it works even if they are used within a single line of text. I first tried to use the GNU iconv() but that function was just incredibly stupid so I wrote my own. While writing it I realized I can just autodetect all input on char-by-char basis, skip most of the “offical” conversion tables and just focus on US-ASCII/Latin-1/first plane of UTF-8. My strniconv() is purely a 80/20 implementation, and that’s good enough for me.


Out of curiosity, have you made a standalone (iconv-like) tool using the
code you wrote? Even if it is just 80/20, that is something I could use
in some situations.


_______________________________________________
Gopher-Project mailing list
Gopher-Project@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/gopher-project

Reply to: