Re: OT: Perl & UTF-8

To: debian-user@lists.debian.org
Subject: Re: OT: Perl & UTF-8
From: dman <dsh8290@rit.edu>
Date: Sat, 12 Jan 2002 14:00:26 -0500
Message-id: <[🔎] 20020112190026.GA2940@localhost>
Mail-followup-to: debian-user@lists.debian.org
In-reply-to: <[🔎] Pine.LNX.4.21.0201121739280.20332-100000@miami.datech2.er.heitec.net>
References: <[🔎] 20020106200432.GO20696@localhost> <[🔎] Pine.LNX.4.21.0201121739280.20332-100000@miami.datech2.er.heitec.net>

On Sat, Jan 12, 2002 at 05:42:31PM +0100, Holger Rauch wrote:
| Hi!
| 
| Thanks for your reply!
| 
| On Sun, 6 Jan 2002, dman wrote:
| 
| > [...]
| > So the regexps you're using are in a 8859-n source file, right? 
| 
| Yep.
| 
| > Can perl handle UTF-8 source files? 
| 
| Don't know. That's why I mailed this question ;-)
| 
| > Are you trying to use things like the
| > posix character class [:alpha:]?  
| 
| No.
| 
| > I don't think those will handle all
| > alphabetic characters in all unicode supported languages (probably
| > just ascii/english alphabet).
| 
| What about \w?

No idea.

Here's another thought, though.  Are you reading the file as if it was
single-byte?  If so, then that won't work right.  For example, the
euro symbol is character \u20ac.  In UTF-8 the file will contain
'\xe2\x82\xac'.  If you read this as you would any other single-byte
file you'll have 3 characters above the us-ascii range.

-D

-- 

The light of the righteous shines brightly,
but the lamp of the wicked is snuffed out.
        Proverbs 13:9

Reply to:

References:
- Re: OT: Perl & UTF-8
  - From: dman <dsh8290@rit.edu>
- Re: OT: Perl & UTF-8
  - From: Holger Rauch <Holger.Rauch@heitec.de>

Prev by Date: Hardware Question
Next by Date: Re: locale error messages
Previous by thread: Re: OT: Perl & UTF-8
Next by thread: Re: OT: Perl & UTF-8
Index(es):
- Date
- Thread