[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [g-i] Arabic / Persian fonts



> General question: starting with an UTF-8 encoded string/file, what is the 
> easiest way to get the hex UTF-8 codes of the characters in it?

Could the attached script (written by Denis Barbier and which I use for
writing locales) help ?

You need to put the string you want to extract the hex codes from
between quotes:

bubulle@cc-mykerinos:~/tmp> cat test
"This is a test"
bubulle@cc-mykerinos:~/tmp> cat test | utf2uxx
"<U0054><U0068><U0069><U0073><U0020><U0069><U0073><U0020><U0061><U0020><U0074><U0065><U0073><U0074>"


Dunno if this is what you're seeking for, though...


-- 


#! /usr/bin/perl -C1

sub c {
	my $text = shift;
	my $convert_ascii = shift;
	my $ret = '';
	while ($text =~ s/(.)//) {
		$l = unpack("U", $1);
		if ($convert_ascii == 0 && $l < 0x80) {
			$ret .= $1;
		} else {
			$ret .= sprintf "<U%04X>", $l;
		}
	}
	return $ret;
}

my $convert_ascii = 1;
while (<>) {
	if (/^LC_IDENTIFICATION/) {
		$convert_ascii = 0;
	} elsif (/^END LC_IDENTIFICATION/) {
		$convert_ascii = 1;
	}
	my $conv = $convert_ascii;
	$conv = 0 if (/^(copy|include)/);
	s/"([^"]*)"/'"'.c($1, $conv).'"'/eg;
	print;
}


Reply to: