[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [BUGS] Test suite fails on alpha architecture



Hi,

Tom Lane [2007-11-07 13:49 -0500]:
> All the other diffs that Martin showed are divide-by-zero failures,
> and I do not see any of them on Gentoo's machine.  I think that this
> must be a compiler bug.  The first example in his diffs is just
> "select 1/0", which executes this code:
> 
>     int32        arg1 = PG_GETARG_INT32(0);
>     int32        arg2 = PG_GETARG_INT32(1);
>     int32        result;
> 
>     if (arg2 == 0)
>         ereport(ERROR,
>                 (errcode(ERRCODE_DIVISION_BY_ZERO),
>                  errmsg("division by zero")));
> 
>     result = arg1 / arg2;
> 
> It looks to me like Debian's compiler must be allowing the division
> instruction to be speculatively executed before the if-test branch
> is taken.  Perhaps it is supposing that this is OK because control
> will return from ereport(), when in fact it will not (the routine
> throws a longjmp).  Since we've not seen such behavior on any other
> platform, however, I suspect this is just a bug and not intentional.

I tried this on a Debian Alpha porter box (thanks, Steve, for pointing
me at it) with Debian's gcc 4.2.2. Latest sid indeed still has this
bug (the floor() one is confirmed fixed), not only on Alpha, but also
on sparc.

Since the simple test case did not reproduce the error, I tried to
make a more sophisticated one which resembles more closely what
PostgreSQL does (sigsetjmp/siglongjmp instead of exit(), some macros,
etc.). Unfortunately in vain, since the test case still works
perfectly with both no compiler options and also the ones used for
PostgreSQL. I attach it here nevertheless just in case someone has
more luck than me.

So I tried to approach it from the other side: Building postgresql
with CFLAGS="-O0 -g" or "-O1 -g" works correctly, but with "-O2 -g" I
get above bug.

So I guess I'll build with -O1 for the time being on sparc and alpha
to get correct binaries until this is sorted out. Any idea what else I
could try?

Thanks,

Martin

-- 
Martin Pitt        http://www.piware.de
Ubuntu Developer   http://www.ubuntu.com
Debian Developer   http://www.debian.org
#include <stdio.h>
#include <stdlib.h>
#include <setjmp.h>

#define ERROR           20

#define ereport(elevel, rest)  \
        (errstart(elevel, __FILE__, __LINE__, __func__) ? \
	         (errfinish rest) : (void) 0)

#define PG_RE_THROW()  \
        siglongjmp(PG_exception_stack, 1)

sigjmp_buf PG_exception_stack;

int errstart(int elevel, const char *filename, int lineno,
                 const char *funcname)
{
	printf("error: level %i %s:%i function %s\n", elevel, filename, lineno, funcname);
	return 1;
}

void errfinish(int dummy, const char* msg)
{
	puts(msg);
	PG_RE_THROW();
}


int
do_div(char** argv)
{
        int     arg1 = atoi(argv[1]);
        int     arg2 = atoi(argv[2]);
        int     result;

        if (arg2 == 0)
                ereport(ERROR, (1, "division by zero"));

        result = arg1 / arg2;

	return result;
}

int
main(int argc, char **argv)
{
	if (sigsetjmp(PG_exception_stack, 0) == 0) {
		int result = do_div(argv);
		printf("%d\n", result);
	} else {
		printf("caught error, aborting\n");
		return 1;
	}

        return 0;
}

Attachment: signature.asc
Description: Digital signature


Reply to: