Re: OT: Perl & UTF-8
On Sat, Jan 12, 2002 at 05:42:31PM +0100, Holger Rauch wrote:
| Hi!
|
| Thanks for your reply!
|
| On Sun, 6 Jan 2002, dman wrote:
|
| > [...]
| > So the regexps you're using are in a 8859-n source file, right?
|
| Yep.
|
| > Can perl handle UTF-8 source files?
|
| Don't know. That's why I mailed this question ;-)
|
| > Are you trying to use things like the
| > posix character class [:alpha:]?
|
| No.
|
| > I don't think those will handle all
| > alphabetic characters in all unicode supported languages (probably
| > just ascii/english alphabet).
|
| What about \w?
No idea.
Here's another thought, though. Are you reading the file as if it was
single-byte? If so, then that won't work right. For example, the
euro symbol is character \u20ac. In UTF-8 the file will contain
'\xe2\x82\xac'. If you read this as you would any other single-byte
file you'll have 3 characters above the us-ascii range.
-D
--
The light of the righteous shines brightly,
but the lamp of the wicked is snuffed out.
Proverbs 13:9
Reply to: