[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

gnucash/gwrapguile/guile on alpha/ia64 bug

clone 115400 -1
reassign -1 gwrapguile
severity -1 wishlist
retitle -1 gwrapguile should run test suite (test/run-test.scm) in debian/rules
severity 115400 grave

First, this isn't remotely a gnucash bug. It's trivially observable by
running "test/run-test.scm" in the gwrapguile build directory.

] aj@ia64au:~/nmu/gwrapguile/gwrapguile-1.1.11/test$ ./run-test.scm
] ERROR: In procedure hash_fn_create_handle_x:
] ERROR: Argument out of range: 4294967295

The bug appears to occur when:

] scm_protect_object(gw__module_gw_runtime_scm_ulongmax);

is called from g-wrapped/gw-runtime.c:gw_init_module_gw_runtime(). From
there on, it's guile internals. They basically look like:

	    hashtab.c:scm_hash_fn_create_handle_x(_, _, _, scm_ihashq, _, _)
	      hash.c:scm_ihashq(obj, _, _);

hash.c:scm_ihashq looks like:

	unsigned int
	scm_ihashq (SCM obj, unsigned int n)
	  return (SCM_UNPACK (obj) >> 1) % n;

The problem now is pretty simple. SCM_UNPACK casts obj (which is basically
64 bits of something) to an scm_bits_t, which is a long, and the value
happens to be ~0 (ULONG_MAX).  ((long)-1) >> 1 == -1 (sign preserving),
and -1 % n == -1 too (I think n's 31 here, for concreteness, but it
shouldn't matter), and casting that to an unsigned int gives 2^32-1,
which is out of range.

QED. :)

The reason this happens on ia64 (and alpha) but not i386 is due to
differences in casting: (SCM_UNPACK(obj) >> 1) is a long, and n is an
unsigned int.  On i386, these are the same size, so the unsigned int takes
precendence.  On ia64, they're not, and the long can represent the entire
range of the unsigned int, so n is upcast, and the sign is preserved [0].


Gnucash doesn't seem to be buggy at all. The only reason it showed
up there was because it has some decent self tests at build time. Yay

Gwrapguile also has some self tests that show up the bug, but doesn't run
them at build time. Boo gwrapguile. This should be fixed, hopefully the
control commands above should remind Monsieur Goerzen to do so sometime.

It's not clear to me what the fix is as far as guile is concerned.
Changing scm_bits_t from long to unsigned long would fix it, but would
probably cause hosts of other issues. Going through and changing all
the hash functions to have "n" be unsigned long instead of unsigned int
should work. Otherwise going through and adding explicit casts would be
a possibility. Upstream probably need to decide on this. It'd also be
nice if they had some self tests of their own.

I'm not sure if a compiler warning for this would be possible, or
worthwhile. A lint warning presumably would never be seen. :-/


[0] K&R 2, A6.5 Arithmetic Conversions
	Otherwise, if one operand is [long int] and the other is
	[unsigned int], the effect depends on whether a [long int]
	can represent all values of an [unsigned int]; if so,
	the [unsigned int] operand is converted to [long int]; if not
	both are converted to [unsigned long int].

Anthony Towns <aj@humbug.org.au> <http://azure.humbug.org.au/~aj/>
We came. We Saw. We Conferenced. http://linux.conf.au/

  ``Debian: giving you the power to shoot yourself in each 
       toe individually.'' -- with kudos to Greg Lehey

Reply to: