RE: Problems with - and ' in some man-pages
Thaddeus H. Black wrote:
> Eric Lavarde writes,
> > in some man pages ... the dashes and single quotes are
> > not really what they look like, but some other unicode
> > letter.  This has two major drawbacks:
> > - search for options become nearly impossible
> > ...
>
> You illustrate well the fundamental problem with
> indiscriminate use of a very large character set like
> Unicode.  If people want to use Unicode, this is fine;
> Unicode and utf-8 exist to be used, after all.  However,
> restricted character sets (mainly ascii and Latin-1)
> offer several real practical benefits that Unicode can
> never provide.  One such benefit is that dashes and
> single quotes are usually what they appear to be.
>
> Comprehensiveness is important, and Unicode is nothing
> if not comprehensive.  On the other hand, simplicity is
> a prime aesthetic, which Unicode lacks.
You might want to take a look at Bytext[1], an alternative approach to a design for a comprehensive character set.  For the Bytext
charset, implementing a search for classes of similar characters is almost trivial.
Anyhow, Bytext certainly is not going to replace Unicode for the foreseeable future, so any user interface search function that is
supposed to operate on Unicode text must be prepared to perform fuzzy searches if it does not want to be mostly useless.  Perhaps a
wishlist bug against "less" is due.
References:
 1. http://www.bytext.org
Reply to: