[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

8-bit safe text utils?



I'm trying to process some 8-bit text on my Debian system and it's
giving me fits.  Clearly some of the programs I'm piping things
through aren't 8-bit aware.  Can someone point me to a good listing of
these and/or to a discussion of how to work around the limitations of
the system.

Here's an example of what I'm doing.  The input is an official Dutch
word list called "woor-den.max" and the output is to be a compressed
dictionary to be included with a free Scrabble clone I'm developing
for the PalmOS platform.

The words include a character (octal 0267) that indicates hyphenation.
I want to pull it out.  If in the bash shell (either running in emacs
via shell mode or in xterm; it doesn't matter) I type

# tr -d "\267" < woor-den.max

tr does nothing.  But if I save the same command as a bash shell
script and execute it I get the desired result.

Working with grep's the same way.

This can't be an unfamiliar problem for those of you across the Atlantic.
What's the best coping strategy?

Thanks!

--Eric House

/******************************************************************************
* Sun .signature deleted: this isn't a Sun project!
******************************************************************************/


Reply to: