Bug#159715: RFP: enca -- Enca is an Extremely Naive Charset Analyser. It detects encoding of text files and is also able to convert them to other encodings.

To: "Debian Bug Tracking System" <submit@bugs.debian.org>
Subject: Bug#159715: RFP: enca -- Enca is an Extremely Naive Charset Analyser. It detects encoding of text files and is also able to convert them to other encodings.
From: "Dmitry Astapov" <adept@umc.com.ua>
Date: Thu, 05 Sep 2002 16:30:07 +0300
Message-id: <[🔎] E17mwi1-0003gD-00@dimail.umc.com.ua>
Reply-to: "Dmitry Astapov" <adept@umc.com.ua>, 159715@bugs.debian.org

Package: wnpp
Version: N/A; reported 2002-09-05
Severity: wishlist

* Package name    : enca
  Version         : 0.10.1
  Upstream Author : David Necas (Yeti) <yeti@physics.muni.cz>
* URL             : http://physics.muni.cz/~yeti/software/enca.shtml
* License         : GPL
  Description     : Enca is an Extremely Naive Charset Analyser. It detects encoding of text files and is also able to convert them to other encodings.

Enca currently can determine 8bit charsets of Belarussian, Czech, Polish, Russian, Slovak and Ukrainian texts and also some multibyte encodings, independently on language (provided it's some European language). The main features include:

    * recognises following 8bit charsets:
          o Belarussian: CP1251, IBM866, ISO-8859-5, KOI8-UNI, maccyr, IBM855
          o Czech: ISO-8859-2, KEYBCS2, IBM852, macce, KOI-8_CS_2, CP1250
          o Polish: ISO-8859-2, IBM852, macce, ISO-8859-13, ISO-8859-16, CP1250, baltic
          o Russian: KOI8-R, IBM866, CP1251, ISO-8859-5, maccyr
          o Slovak: CP1250, KEYBCS2, IBM852, macce, KOI-8_CS_2, ISO-8859-2
          o Ukrainian: CP1251, IBM855, ISO-8859-5, KOI8-U, maccyr, CP1125
    * recognises several multibyte encodings: UCS-2, UCS-4, UTF-8, UTF-7 and TeX accents
    * recognises all common EOL types, byte orders and also Quoted-printables
    * can report charset names after various conventions (or programs) as well as human-readable descriptions; accepts all common charset aliases
    * works with multiple files and can act as an intelligent filter
    * converts files using a built-in convertor, GNU recode library, UNIX98 iconv functions or some external convertor that can be specified on command line (e.g. cstocs, GNU recode)
    * has a special ambiguous mode for very short texts
    * can filter out binary parts of file and/or box drawing characters before guessing so it can determine encoding of pretty messy files
    * uses various tricks to solve hardly decidable cases like distinguishing between iso8859-2/cp1250, etc.

PS
Seems like it even have ./debian in source tarball

-- System Information:
Debian Release: testing/unstable
Architecture: i386
Kernel: Linux dimail 2.4.18 #1 Вск Авг 4 01:32:32 EEST 2002 i686
Locale: LANG=ru_RU.KOI8-R, LC_CTYPE=ru_RU.KOI8-R

-- no debconf information

Reply to:

Prev by Date: Bug#159686: ITP: gkrelldnet2 -- A dnetc plugin for GKrellM version 2
Next by Date: Bug#159717: ITP: python-crypto -- Cryptographic algorithms and protocols for Python
Previous by thread: Bug#159686: marked as done (ITP: gkrelldnet2 -- A dnetc plugin for GKrellM version 2)
Next by thread: Bug#159717: ITP: python-crypto -- Cryptographic algorithms and protocols for Python
Index(es):
- Date
- Thread