[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [gopher] CAPS capability: ServerDefaultCharset



I personally try not to store any of my English documents in UTF-8/Unicode, 
because of how certain symbols are encoded, such as ' and `, and a few others.  
These don't degrade to ASCII very well, as I've been when I wrote the RSS Feed 
parser for my Gopherhole.  Unicode has always been hell for me to support 
properly, mainly when converting to plain ASCII.  I don't see the purpose of 
using it for English only documents, ASCII to me seems to support everything I 
need to write an entire book.  I understand the need of such encoding for the 
non-English speaking population, and that's where it should be applied.

Anyways, that's my 2c about this encoding stuff being talked about currently.

On January 3, 2015 04:48:02 AM Mateusz Viste wrote:
> On 01/03/2015 12:39 PM, Nuno Silva wrote:
> > Improperly rendered UTF-8 will easily become unreadable[1], which is my
> > main problem when mixing encodings. By "unreadable" I mean that you
> > can't get the meaning of the text.
> 
> Yes, but again, I had in mind people that *already* use utf-8 in the
> gopherspace, not mass conversion of existing stuff. In this situation,
> such CAPS setting can only help, and do no harm (worst case scenario:
> the gopher client ignores CAPS, and renders the content like it does
> currently).
> 
> > Several languages require characters that are not part of ASCII,
> > including Finnish, Spanish, French and Portuguese.
> 
> And Polish, and many other. But these are "soft" problems, you got at
> least latin characters right, so lecture is possible. But try to read
> any cyrillic-based language (Ukrainian, Russian, Bulgarian...) - there,
> *every* character is scrambled.
> 
> > Are there any gopher clients that try to autodetect whether the text is
> > utf8 or ISO-8859?
> 
> None that I know about.
> 
> > (IF that's even possible without false positives - I guess it's easier
> > with ISO-8859-1...)
> 
> On the contrary, it's much easier to identify UTF-8, since it uses
> clearly defined bit patterns. Detecting any 8-bit charset is a mess, as
> it requires statistical analysis of the content.
> 
> Mateusz
> 
> _______________________________________________
> Gopher-Project mailing list
> Gopher-Project@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/gopher-project

_______________________________________________
Gopher-Project mailing list
Gopher-Project@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/gopher-project




Reply to: