[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#166167: ITP: libtext-unidecode-perl -- US-ASCII transliterations Unicode text



Package: wnpp
Version: N/A; reported 2002-10-24
Severity: wishlist

* Package name    : libtext-unidecode-perl
  Version         : 0.04
  Upstream Author : Sean M. Burke <sburke@cpan.org>
* URL             : http://search.cpan.org/author/SBURKE/Text-Unidecode-0.04/lib/Text/Unidecode.pm
* License         : Joint GPL1 / Artistic
  Description     : Last-resort ASCII transliterations of Unicode text

 Text::Unidecode is a simple, quick and dirty library for converting
 displayable characters outside the US-ASCII range U+0000 to U+007F into
 that range. The method employed is a lossy, simplistic, context-
 insensitive, and usually phonetic transliteration into Roman characters,
 which is passable for Cyrillic and Greek alphabets, sometimes okay for
 non-Western scripts, bad for Mandarin Chinese and worse for other uses
 of the Han characters and the Thai script.
 .
 In other words, if there is a library which directly addresses your
 problem domain, then you should be using that instead.
 .
 On the other hand, this library's output is always better than
 characters being transcribed as empty boxes, "?"s, or backslash-
 references into UTF8 space.

The algorithm and its capabilities and shortcomings are described in
<http://www.sysadminmag.com/documents/sam05060002/>.


-- System Information
Debian Release: testing/unstable
Architecture: i386
Kernel: Linux swift 2.4.19via-epia-tiny #1 Fri Oct 11 21:57:33 BST 2002 i686
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8




Reply to: