[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: European chars to ascii



On Fri, Aug 19, 2005 at 09:34:24AM -0400, Tong wrote:
> 
> Is there any tools that can convert European characters to plain
> 7bit-Ascii?
> 
> E.g., ä => a, ö => o, etc. 

I don't know if there's a better tool, but I would do something like:

$ tr 'äöüß' 'aous' <isolatin1-in >ascii-out

(simply extend the char lists as required)

This only works with a 1-char => 1-char mapping.  If you rather want
a 1-char => multiple-char mapping (e.g, in German, we'd typically
substitute ä => ae, ö => oe, etc.), you could start with a little
script like this

#!/usr/bin/perl

%mapping = (
    'ä' => 'ae',
    'ö' => 'oe',
    'ü' => 'ue',
    'ß' => 'ss',
    # ...
);

$set = join '', map sprintf("\\x%x", ord $_), keys %mapping;
    
while (<>) {
    s/([$set])/$mapping{$1}/ge;
    print;
}

Or, if you'd like to specify the special characters' hex codes (in case
you have problems entering them directly...), you could write instead

#!/usr/bin/perl

%mapping = (
    'e4' => 'ae',
    'f6' => 'oe',
    'fc' => 'ue',
    'df' => 'ss',
    # ...
);

$set = join '', map "\\x$_", keys %mapping;
    
while (<>) {
    s/([$set])/$mapping{sprintf "%x", ord $1}/ge;
    print;
}

Cheers,
Almut


P.S. Normally, you'd use iconv for encoding conversions.  However,
"iconv -f 8859_1 -t ASCII isolatin1-file" doesn't work, because ASCII
can only represent a subset of characters present in 8859_1 -- which
makes iconv complain...



Reply to: