[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

New Unifont release



The Unifont is a biwidth (8x16/16x16) bitmap font, designed to provide
coverage for all of Unicode. It currently covers a large part of Unicode
3.0. Note that this version of the Unifont only provides a single glyph
for each character, making it impossible to handle any language that
needs context-dependent character shaping well. It is supplied in the
form of a hex file, with a converter to convert it to bdf. See
http://czyborra.com/unifont/.

Roman Czyborra <roman@czybora.com> created the GNU Unifont as a low
quality font to provide a glyph for every Unicode character.
(Czyborra.com still has several cool tools for the unifont not included
here.) Since Roman Czyborra has been unable to maintain the Unifont for
a while, and many patches existed on gnu-unifont@groups.yahoo.com
(http://groups.yahoo.com/group/gnu-unifont), I, David Starner
<dstarner98@aasaa.ofe.org> decided to make a new release extending the
Unifont with many characters. I have no plans to fork the Unifont, if
that can be avoided.

The new release can be found at
http://people.debian.org/unifont-dvdeug-1.0.tar.gz

The Unifont currently has over 35,000 glyphs covering many of the
world's languages. The Unifont currently contains all of WGL4, all of
MES-2, all of MES-3B (except for the 43 characters listed below), all of
ISO-8859-*, all of CP125[0-8], all of EES (except Khutsuri Georgian),
all of EUC-JP, all of BIG5 (except for private use mapped characters),
and all of EUC-KR.  (We're missing some 2,157 characters from EUC-TW.)

Since last version, several new scripts (including Cherokee and United
Canadian Aboriginal Syllabics) and a lot of characters have been added,
and many old characters improved. Also, the hex2bdf now produces two bdf
files, since the old bdf file was incorrectly labeled a charcell font
while having multiple widths for characters. This produced problems in
xterm, among others. (Some versions of Yudit that previously showed all
the characters may only show the single-width characters now. Since
xterm is correct and more important than Yudit, the change was made.)

To install:
 ./hex2bdf < unifont.hex > unifont.bdf
 ./hex2bdf y < unifont.hex > unifont-double.bdf
 (Optional: run bdf2pcf on them, and gzip them.)
 Move them to a X11 font directory.
 Run mkfontdir in that directory.
 Restart X.

Breakdown by Unicode character blocks:

The unifont includes a glyph for every character in these blocks: Basic
Latin, Latin-1, Latin Extended A, Latin Extended B, IPA Extensions,
Spacing Modifier Letters, Combining Diacritical Marks, Greek and Coptic,
Cyrillic, Armenian, Devanagari, Thai, Lao, Ethiopic, Ogham, Runic, Latin
Extended Additional, Greek Extended, General Punctuation, Superscripts
and Subscripts, Currency Symbols, Combining Diacritical Marks for
Symbols, Letterlike Symbols, Number Forms, Arrows, Mathematical Symbols,
Control Pictures, Optical Character Recognition, Enclosed Alphanumerics,
Box Drawing, Block Elements, Geometric Shapes, Dingbats, Braille
Patterns, Ideographic Description Characters, Katakana, Hangul
Compatibility Jamo, Hangul Syllables, Combining Half Marks, CJK
Compatibility Forms, Small Forms Variants, Arabic Presentation Forms-B.

The unifont is missing only a few characters from the following blocks:
Arabic, Miscellaneous Technical, Miscellaneous Symbols, Hiragana,
Bopomofo, CJK Compatibility Ideographs, and Alphabetic Presentation
Forms.

The unifont includes a large usable subset of the following blocks:
Tibetan, Georgian (all of the modern characters), UCAS, CJK Unified
Ideographs Extension A, CJK Unified Ideographs, Arabic Presentation
Forms-A, Halfwidth and Fullwidth Forms.

The unifont includes no characters from, or only a few characters from:
all other blocks.

Known Issues: We still have circles on combining characters, meaning
that any software 'intelligent' enough to try and combine them will just
produce a mess. Our idea of single and double width is ad-hoc; we don't
conform to any standard on that. Some of the ideographs use all four
edges of the 16 by 16 square, meaning they may blur together when used
in many programs. (I need an educated opinion by someone who reads
Chinese or Japanese on how serious this problem is.)

License:
All of my works you find here are freeware. You may freely copy, use, quote,
modify or redistribute them as long as you properly attribute my
contribution
and have given a quick thought about whether Roman might perhaps be
interested
to read what you did with his stuff. Horizontal rules don't apply.

Open Issues:

* Should we have glyphs for the spaces? IMO, it would be best just to
leave them blank.

* Should we try and extend it to ligatures and other complex language
support? Robert Brady's extensions to BDF mean that we could
productively extend unifont to do that, and the extensions would be
usable with GTK 2.0.

* We need to remove all the circles on combining characters, so they
can be used as such for xterm. Are there any other details that need to
be worried about?

* This version is split into single and double width fonts to properly
work with xterm. However, xterm uses system wcwidth to determine which
glyphs are single width and which are double, meaning that any character
that unifont thinks should be double width and xterm believes should be
single width will disappear. (Same if the situation is reversed, but I
can't think of any cases where that would happen and unifont wouldn't be
in the wrong.) Should we worry about making unifont correspond to
wcwidth? We could include two glyphs (one double, one single) in the
unifont, but it would frequently be wasted effort. We could provide full
double width Cyrillic/Greek/etc., if anyone really wants to draw the
glyphs.

* Many CJK ideographs are suboptimal in the Unifont, since they
fill the 16x16 square instead of leaving room below, above or to the
right, as a proper Unifont character should. We could import them from a
scalable font, or undersized versions from fixed. Or we could find
someone in the mood to draw 30,000 ideographs specially for unifont.
(And if you are, would you mind going a little further and doing all the
Plane 2 characters as well?)

* The arrows from 21E6 to 21F0 should be redesigned to look the same, as
far as possible within the Unifont's constraints.

Goals for next release, provided Czyborra yet slumbers:
Full MES-3B and EES. Remove combining circles. Decide on a solution (or
if a solution is even needed) on the CJK ideographs.

Lists of MES-3B characters that are double width in the Unifont:

# Plane 00
# Rows Positions (Cells)

  04 88-89
  20 00-0F 28-2F 6A-6F DD-E0 E2-E3
  21 3A
  FF FC

MES-3B characters that aren't in the Unifont.

# Plane 00
# Rows Positions (Cells)

  23 01 7D-9A
  26 19
  FF F9-FB

EES characters that aren't in the Unifont
# Plane 00
# Rows Positions (Cells)

  10 A0-C5

# Number of characters in above table: 38

--
David Starner - dstarner98@aasaa.ofe.org, dvdeug@debian.org
"The pig - belongs - to all mankind!" - Invader Zim





Reply to: