[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: OT: Perl & UTF-8



On Sat, Jan 12, 2002 at 05:42:31PM +0100, Holger Rauch wrote:
| Hi!
| 
| Thanks for your reply!
| 
| On Sun, 6 Jan 2002, dman wrote:
| 
| > [...]
| > So the regexps you're using are in a 8859-n source file, right? 
| 
| Yep.
| 
| > Can perl handle UTF-8 source files? 
| 
| Don't know. That's why I mailed this question ;-)
| 
| > Are you trying to use things like the
| > posix character class [:alpha:]?  
| 
| No.
| 
| > I don't think those will handle all
| > alphabetic characters in all unicode supported languages (probably
| > just ascii/english alphabet).
| 
| What about \w?

No idea.

Here's another thought, though.  Are you reading the file as if it was
single-byte?  If so, then that won't work right.  For example, the
euro symbol is character \u20ac.  In UTF-8 the file will contain
'\xe2\x82\xac'.  If you read this as you would any other single-byte
file you'll have 3 characters above the us-ascii range.

-D

-- 

The light of the righteous shines brightly,
but the lamp of the wicked is snuffed out.
        Proverbs 13:9



Reply to: