Re: at least 260 packages broken on arm, powerpc and s390 due to wrong assumption on char signedness
Ganesan R <email@example.com> wrote:
>>>>>> "Steve" == Steve Greenland <firstname.lastname@example.org> writes:
>> On 31-Dec-01, 19:42 (CST), Ganesan R <email@example.com> wrote:
>>> Another thing that puzzles me since this whole debate started. If you look
>>> at the declaration of ctype.h functions (isalpha family), they take a int as
>>> an argument.
>> I don't know that I agree about needing to pass them unsigned char,
>> though. The char->int conversion should be value preserving. If you pass
>> a negative value, then you are making a domain error, and deserve what
>> you get.
> Are you saying that isalpha() etc should work for negative values or that
> you should never call isalpha() with negative values for chars? In a
> ISO8859-1 locale when I call isalpha() for accented characters I should get
> expected results without worrying about whether the accented character is a
> signed quantity. Unfortunately older C library implementations may break
> because an accented character will take a negative index on a character
> table without proper casting.
> I followed up a bit on this and found out that ISO C specifically states
> that ctype functions should work for all values of unsigned char as well as
> the default char type. In other words, if the default C char type is signed
> you can just call the functions without any cast and expect it to work.
If every system had up-to-date, standards-conforming
ctype.h support, we wouldn't have to worry much at all.
But even these days, pretty many systems with buggy macros
are still in use.
FYI, as far as I know, the most portable way to use
the ctype macros is to define wrapper macros
(e.g., like those below, from fileutils/src/sys2.h)
and then use only the wrappers (upper-case names) from your code.
Of course, the following assumes you have the right
definitions for STDC_HEADERS and HAVE_ISASCII.
You get those by using these autoconf macros:
Be careful when choosing between ISDIGIT and ISDIGIT_LOCALE.
/* [someone :-)] writes:
"... Some ctype macros are valid only for character codes that
isascii says are ASCII (SGI's IRIX-4.0.5 is one such system --when
using /bin/cc or gcc but without giving an ansi option). So, all
ctype uses should be through macros like ISPRINT... If
STDC_HEADERS is defined, then autoconf has verified that the ctype
macros don't need to be guarded with references to isascii. ...
Defining isascii to 1 should let any compiler worth its salt
eliminate the && through constant folding."
Bruno Haible adds:
"... Furthermore, isupper(c) etc. have an undefined result if c is
outside the range -1 <= c <= 255. One is tempted to write isupper(c)
with c being of type `char', but this is wrong if c is an 8-bit
character >= 128 which gets sign-extended to a negative value.
The macro ISUPPER protects against this as well." */
#if STDC_HEADERS || (!defined (isascii) && !HAVE_ISASCII)
# define IN_CTYPE_DOMAIN(c) 1
# define IN_CTYPE_DOMAIN(c) isascii(c)
# define ISBLANK(c) (IN_CTYPE_DOMAIN (c) && isblank (c))
# define ISBLANK(c) ((c) == ' ' || (c) == '\t')
# define ISGRAPH(c) (IN_CTYPE_DOMAIN (c) && isgraph (c))
# define ISGRAPH(c) (IN_CTYPE_DOMAIN (c) && isprint (c) && !isspace (c))
/* This is defined in <sys/euc.h> on at least Solaris2.6 systems. */
#define ISPRINT(c) (IN_CTYPE_DOMAIN (c) && isprint (c))
#define ISALNUM(c) (IN_CTYPE_DOMAIN (c) && isalnum (c))
#define ISALPHA(c) (IN_CTYPE_DOMAIN (c) && isalpha (c))
#define ISCNTRL(c) (IN_CTYPE_DOMAIN (c) && iscntrl (c))
#define ISLOWER(c) (IN_CTYPE_DOMAIN (c) && islower (c))
#define ISPUNCT(c) (IN_CTYPE_DOMAIN (c) && ispunct (c))
#define ISSPACE(c) (IN_CTYPE_DOMAIN (c) && isspace (c))
#define ISUPPER(c) (IN_CTYPE_DOMAIN (c) && isupper (c))
#define ISXDIGIT(c) (IN_CTYPE_DOMAIN (c) && isxdigit (c))
#define ISDIGIT_LOCALE(c) (IN_CTYPE_DOMAIN (c) && isdigit (c))
# define TOLOWER(Ch) tolower (Ch)
# define TOUPPER(Ch) toupper (Ch)
# define TOLOWER(Ch) (ISUPPER (Ch) ? tolower (Ch) : (Ch))
# define TOUPPER(Ch) (ISLOWER (Ch) ? toupper (Ch) : (Ch))
/* ISDIGIT differs from ISDIGIT_LOCALE, as follows:
- Its arg may be any int or unsigned int; it need not be an unsigned char.
- It's guaranteed to evaluate its argument exactly once.
- It's typically faster.
Posix 1003.2-1992 section 126.96.36.199 page 50 lines 1556-1558 says that
only '0' through '9' are digits. Prefer ISDIGIT to ISDIGIT_LOCALE unless
it's important to use the locale's definition of `digit' even when the
host does not conform to Posix. */
#define ISDIGIT(c) ((unsigned) (c) - '0' <= 9)