[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: at least 260 packages broken on arm, powerpc and s390 due to wrong assumption on char signedness

Ganesan R <rganesan@myrealbox.com> wrote:
>>>>>> "Steve" == Steve Greenland <steveg@moregruel.net> writes:
>> On 31-Dec-01, 19:42 (CST), Ganesan R <rganesan@myrealbox.com> wrote:
>>> Another thing that puzzles me since this whole debate started. If you look
>>> at the declaration of ctype.h functions (isalpha family), they take a int as
>>> an argument.
>> I don't know that I agree about needing to pass them unsigned char,
>> though. The char->int conversion should be value preserving. If you pass
>> a negative value, then you are making a domain error, and deserve what
>> you get.
> Are you saying that isalpha() etc should work for negative values or that
> you should never call isalpha() with negative values for chars? In a
> ISO8859-1 locale when I call isalpha() for accented characters I should get
> expected results without worrying about whether the accented character is a
> signed quantity. Unfortunately older C library implementations may break
> because an accented character will take a negative index on a character
> table without proper casting.
> I followed up a bit on this and found out that ISO C specifically states
> that ctype functions should work for all values of unsigned char as well as
> the default char type. In other words, if the default C char type is signed
> you can just call the functions without any cast and expect it to work.

If every system had up-to-date, standards-conforming
ctype.h support, we wouldn't have to worry much at all.
But even these days, pretty many systems with buggy macros
are still in use.

FYI, as far as I know, the most portable way to use
the ctype macros is to define wrapper macros
(e.g., like those below, from fileutils/src/sys2.h)
and then use only the wrappers (upper-case names) from your code.

Of course, the following assumes you have the right
definitions for STDC_HEADERS and HAVE_ISASCII.
You get those by using these autoconf macros:

Be careful when choosing between ISDIGIT and ISDIGIT_LOCALE.

#include "config.h"
#include <ctype.h>

/* [someone :-)] writes:

   "... Some ctype macros are valid only for character codes that
   isascii says are ASCII (SGI's IRIX-4.0.5 is one such system --when
   using /bin/cc or gcc but without giving an ansi option).  So, all
   ctype uses should be through macros like ISPRINT...  If
   STDC_HEADERS is defined, then autoconf has verified that the ctype
   macros don't need to be guarded with references to isascii. ...
   Defining isascii to 1 should let any compiler worth its salt
   eliminate the && through constant folding."

   Bruno Haible adds:

   "... Furthermore, isupper(c) etc. have an undefined result if c is
   outside the range -1 <= c <= 255. One is tempted to write isupper(c)
   with c being of type `char', but this is wrong if c is an 8-bit
   character >= 128 which gets sign-extended to a negative value.
   The macro ISUPPER protects against this as well."  */

#if STDC_HEADERS || (!defined (isascii) && !HAVE_ISASCII)
# define IN_CTYPE_DOMAIN(c) 1
# define IN_CTYPE_DOMAIN(c) isascii(c)

#ifdef isblank
# define ISBLANK(c) (IN_CTYPE_DOMAIN (c) && isblank (c))
# define ISBLANK(c) ((c) == ' ' || (c) == '\t')
#ifdef isgraph
# define ISGRAPH(c) (IN_CTYPE_DOMAIN (c) && isgraph (c))
# define ISGRAPH(c) (IN_CTYPE_DOMAIN (c) && isprint (c) && !isspace (c))

/* This is defined in <sys/euc.h> on at least Solaris2.6 systems.  */
#undef ISPRINT

#define ISPRINT(c) (IN_CTYPE_DOMAIN (c) && isprint (c))
#define ISALNUM(c) (IN_CTYPE_DOMAIN (c) && isalnum (c))
#define ISALPHA(c) (IN_CTYPE_DOMAIN (c) && isalpha (c))
#define ISCNTRL(c) (IN_CTYPE_DOMAIN (c) && iscntrl (c))
#define ISLOWER(c) (IN_CTYPE_DOMAIN (c) && islower (c))
#define ISPUNCT(c) (IN_CTYPE_DOMAIN (c) && ispunct (c))
#define ISSPACE(c) (IN_CTYPE_DOMAIN (c) && isspace (c))
#define ISUPPER(c) (IN_CTYPE_DOMAIN (c) && isupper (c))
#define ISXDIGIT(c) (IN_CTYPE_DOMAIN (c) && isxdigit (c))
#define ISDIGIT_LOCALE(c) (IN_CTYPE_DOMAIN (c) && isdigit (c))

# define TOLOWER(Ch) tolower (Ch)
# define TOUPPER(Ch) toupper (Ch)
# define TOLOWER(Ch) (ISUPPER (Ch) ? tolower (Ch) : (Ch))
# define TOUPPER(Ch) (ISLOWER (Ch) ? toupper (Ch) : (Ch))

/* ISDIGIT differs from ISDIGIT_LOCALE, as follows:
   - Its arg may be any int or unsigned int; it need not be an unsigned char.
   - It's guaranteed to evaluate its argument exactly once.
   - It's typically faster.
   Posix 1003.2-1992 section page 50 lines 1556-1558 says that
   only '0' through '9' are digits.  Prefer ISDIGIT to ISDIGIT_LOCALE unless
   it's important to use the locale's definition of `digit' even when the
   host does not conform to Posix.  */
#define ISDIGIT(c) ((unsigned) (c) - '0' <= 9)

Reply to: