[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: remove/replace non-ascii characters from file



Mike McCarty wrote:
garbage (represented as ^@^@^@^@^@^@ etc.)

I suppose you mean "non-graphic ASCII". Those are NUL characters,
which the ASCII *definition* states can be inserted or removed
from *any* stream without changing its meaning. This means that
your application is not ASCII compliant. Sorry, but in this case
(unusual, I know) Windows is right and your app is wrong.

Well, I don't know that much about the ASCII *definition*, but if I open the file in Window$ notepad (I never use that for any purpose, I just did it out of curiosity), these characters appear as additional spaces. They are saved as spaces and in the saved file the characters are replaced by spaces (ie. linux-compliant spaces).

So, if you are right, that means that M$ notepad converts these NUL characters to spaces, which is a bad thing, if these are indeed different characters and useful for anything.

Anyway, I don't think it is a useful feature of a program to include NUL characters in the header of data files like the present one which just consists of a short header and two columns of x and y data. I'd be curious of the programmer's reason for putting about 50 of these at the end of the comment.

You might try tr. On another note, here's a C program which will do what
you want. It's written as a filter, so no file names on the line... this
is strictly no-frills programming. Placed into the public domain by
me, the original author today, Thursday 3 August 2006. If you *need*
file names on the command line (like for use with find and xargs)
then I can add that, but I thought something quick'n'nasty might
be more what you need.

I appreciate your effort! I was anyway writing a script to postprocess the data, so the most convenient way was to remove the junk via another command line.

Thanks,

Johannes



Reply to: