[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Caml-list] problems with ocamlopt



> I have some code which is doing some basic matrix calculations (for a
> perceptron classifier) and I've run into some problems with ocamlopt. It
> seems that the same code, which runs fine on my laptop (powerbook, mac
> os x > 10.2, ocaml 3.06) with ocamlopt and runs fine on a linux machine
> (athlon mp 2000+, 2.4.18-24.7, ocaml 3.06 using gcc 3.2), does not run
> on my work machine (pentium 4 2.26, debian 2.4.20, ocaml 3.06 using gcc
> 3.3.1). On my work machine if I just compile via ocamlc, the code runs
> fine. But if I do it to native code, I get (a) incorrect numbers after a
> certain number of iterations of my code, (b) a stack overflow after a
> few more iterations.

It is really surprising that the ocamlopt-generated code behaves
differently on the two x86 machines (the Athlon and the P4).  I hope
this is not due to a gcc 3.3.1 bug.  (OCaml has exhibited bugs in
certain versions of gcc before.)  

The following facts could explain the other differences (between
ocamlc and ocamlopt, and between the Powerbook and the x86 machines):

1- On the x86, ocamlopt-generated code uses extended precision (80
bits) to compute certain intermediate float results, while ocamlc
performs all float computations with 64 bit floats.  The extra
precision doesn't hurt in general (you get "more correct" results),
but if your computations are ill-conditioned, this can cause the
results to vary dramatically between bytecode and native.  (If this
happens, it's really a problem with your code, which is numerically
unstable and thus computes meaningless results.)

2- ocamlc always execute tail-calls in constant stack space, while ocamlopt 
will consume stack space for tail-calls in the following case: the
called function is not the current function (i.e. this isn't a tail
recursion) and the number of arguments to the function exceed the
number of registers set apart for parameter passing.  The latter
number is 6 for x86 processors, and 8 for the PowerPC.

So, 1- could explain why you'd get different results with ocamlc or
ocamlopt/PowerPC and with ocamlopt/x86.  And 2- could explain why
you'd get a stack overflow with ocamlopt/x86 (e.g. if you have a
7-argument tail-call that is not tail-rec) but not with ocamlc nor
ocamlopt/PowerPC. 

But this doesn't explain the difference between the two ocamlopt/x86
platforms.

Did you try taking the executable generated by ocamlopt on the Athlon
machine and running it on the P4 machine?

- Xavier Leroy



Reply to: