Bug#166167: ITP: libtext-unidecode-perl -- US-ASCII transliterations Unicode text
Package: wnpp
Version: N/A; reported 2002-10-24
Severity: wishlist
* Package name : libtext-unidecode-perl
Version : 0.04
Upstream Author : Sean M. Burke <sburke@cpan.org>
* URL : http://search.cpan.org/author/SBURKE/Text-Unidecode-0.04/lib/Text/Unidecode.pm
* License : Joint GPL1 / Artistic
Description : Last-resort ASCII transliterations of Unicode text
Text::Unidecode is a simple, quick and dirty library for converting
displayable characters outside the US-ASCII range U+0000 to U+007F into
that range. The method employed is a lossy, simplistic, context-
insensitive, and usually phonetic transliteration into Roman characters,
which is passable for Cyrillic and Greek alphabets, sometimes okay for
non-Western scripts, bad for Mandarin Chinese and worse for other uses
of the Han characters and the Thai script.
.
In other words, if there is a library which directly addresses your
problem domain, then you should be using that instead.
.
On the other hand, this library's output is always better than
characters being transcribed as empty boxes, "?"s, or backslash-
references into UTF8 space.
The algorithm and its capabilities and shortcomings are described in
<http://www.sysadminmag.com/documents/sam05060002/>.
-- System Information
Debian Release: testing/unstable
Architecture: i386
Kernel: Linux swift 2.4.19via-epia-tiny #1 Fri Oct 11 21:57:33 BST 2002 i686
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8
Reply to: