[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: #284724: Interpretation of NON-BREAK SPACE



> ... should all characters in the class [:space:] be
> treated as a token seperator in shells/languages, or
> just the ASCII SPACE?

If it seems pertinent to you, the C language standard
sets this precedent [1]: "The source file is decomposed
into preprocessing tokens and sequences of white-space
characters (including comments)."  Although the verbiage
in the standard annoyingly leaps a few extra gymnastics
to support some obsolete anti-ascii character sets,
basically, all six of the traditional [:space:]
characters [\t\n\v\f\r ] are treated the same---except
that the two-character sequence "\\\n" (backslash
followed by newline) serves to join two lines of text
before preprocessing.

However, the NO-BREAK SPACE is not one of the six
characters, nor is it a member of [:space:], nor is it
acknowledged by the iswspace(3) function.  Personally,
if I were using your interpreter, I would not want it
quietly to accept a NO-BREAK SPACE as a token separator.
I would prefer it to warn me that I had some weird
character lurking in my script.

-- 
Thaddeus H. Black
508 Nellie's Cave Road
Blacksburg, Virginia 24060, USA
+1 540 961 0920, t@b-tk.org

[1] ISO/IEC 9899:1999 5.1.1.2.3.

Attachment: pgprBpaF_L4ii.pgp
Description: PGP signature


Reply to: