[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#636017: ITP: tran[s[lit]] -- transcribe between character scripts (Cyrillic <-> Latin, etc)



Package: wnpp
Severity: wishlist
Owner: Adam Borowski <kilobyte@angband.pl>

* Package name    : tran? trans? translit?
  Upstream Author : Adam Borowski <kilobyte@angband.pl>
* URL             : https://github.com/kilobyte/tran
* License         : GPL
  Programming Lang: Perl
  Description     : transcribe between character scripts (Cyrillic <-> Latin, etc)

This is a tool for romanization / cyrillization / greekization / etc of text.
It converts character scripts rather than encodings.  For example, it can
turn "Debian" into "Дэбян", "Δεβιαν".

Currently supported scripts:
* latin
* ascii (ie, dropping accents)
* fullwidth (doublewidth ascii for most of us)
* cyrillic
* greek
* devanagari
* katakana
* hiragana
* hangul
and more are coming.  Unicode has for example 13 fancy sets of letters for
mathematical purposes (fraktur, double-strike, etc), this is not supported
yet because a problem in glibc[1], circled/boxed letters, etc.  Not to
mention all the remaining scripts in Unicode and ConScript.

It tries to do transcription rather than mere transliteration, but is still
pretty naive and doesn't go far into realms of phonetic accuracy.

I named this project ~six years ago "tran" which is probably way too
generic.  I guess "translit" might be a bit better.

There is a similar tool in Debian: libtext-unidecode-perl, but it can go
only one way, targets basic ASCII rather than Latin and fails to preserve
non-letter characters like frames.


[1]. towlower(0x1D400) and friends don't work.  This needs either to be
fixed in glibc, or be worked around with hand-crafted case conversions.



Reply to: