[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: How to guess or check encoding of text file.



Hi,

From: Osamu Aoki <osamu@debian.org>
Subject: Re: How to guess or check encoding of text file.
Date: Mon, 6 Jan 2003 00:17:10 -0800

> > #!/bin/sh
> > if iconv -f UTF-8 -t UTF-8 <$1 &>/dev/null
> > then
> >   echo UTF-8
> > else
> >   echo ISO-8859-1
> > fi
> 
> Bingo :)  Maybe this can be wishlist for iconv.

You mean, this script should be included in glibc package?
I don't think so, because this script is based on too many assumptions.
Generally, encoding guessing *must* be based on many assumptions,
otherwise the guessing is too poor to be useful.

Now, the assumption is that the input file must be either UTF-8 or
ISO-8859-1.  Only by adding ISO-8859-2 as a candidate, guessing will
be impossible.

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/




Reply to: