[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Dictionary changes



On Wed, 2 Jul 2014 12:22:02 -0400
Steve Litt <slitt@troubleshooters.com> wrote:

> Another thing to remember is that the wordlist is no longer ASCII,

Excellent thing at the age of UTF-N.

> cat /usr/share/dict/words | grep -i "$1"

Simplify it: grep -i "$1" /usr/share/dict/words

> If you look up ^smor.*rd$, you get nothing. But if you look up
> ^sm.*rd$ you get smörgåsbord. What I'd like to do is get grep to
> think "å" is a hit for "a" and report it, but report it as "å".
> I'll let you know when I figure out how to do that, or do some
> other thing that produces the same result. Prepending LC_ALL=
> either C, C.UTF-8, en_US.utf8, or POSIX, to the grep command,
> didn't do it either.

You can't, 'cos these letters do not have the same code
in either encoding.
(But your case is interesting; may be a rewritten grep,
including conversions, would be of interest).
 
> If worst comes to worst and I can't find a way to get grep to do
> this, I'll just put together a substitution table,
> convert /usr/share/dict/words to words.ascii, line for line, search
> words.ascii, get the line number, and pull that line out of words.
> Crude, but effective.

AFAIK, this is the only way to be able to perform what you want.

-- 
To be is to do.	-- I. Kant
To do is to be.	-- A. Sartre
Do be a Do Bee!	-- Miss Connie, Romper Room
Do be do be do!	-- F. Sinatra
Yabba-Dabba-Doo! -- F. Flintstone

Attachment: signature.asc
Description: PGP signature


Reply to: