Hello, On Mon, 2015-02-02 at 07:51 +0100, Andreas Tille wrote: > Hi Mentors, > It is very important to build vsearch with the maximum optimisation for speed > and thus I wonder whether dropping this option is a good idea or whether > I should enable it on i386 and amd64 (the question extends also to > freebsd-i386/freebsd-amd64 once an other issue in freebsd with this > package is solved). On amd64 sse/sse2 is enabled by default. Tuning the code for a specific processor (i.e. core2) might not be such a good idea, according to the GCC man page one should use -mtune=generic instead: "generic: Produce code optimized for the most common IA32/AMD64/EM64T processors. If you know the CPU on which your code will run, then you should use the corresponding -mtune or -march option instead of -mtune=generic. But, if you do not know exactly what CPU users of your application will have, then you should use this option. As new processors are deployed in the marketplace, the behavior of this option will change. Therefore, if you upgrade to a newer version of GCC, code generation controlled by this option will change to reflect the processors that are most common at the time that version of GCC is released. " In addition, with itksnap I saw that -funroll-loops and -ftree-vectorize improved performance a lot, and these are options that do not depend on the architecture, but are also not enabled by default. -funroll-loops may also slow down the code, you should check this. It is especially effective if there are many small loops of fixed size (like it is the case with ITK's types that are templated over dimensions). -ftree-vectorize may be useless on x86 without SSE but on amd64 it could give some speedups. hope that helps. Gert
Attachment:
signature.asc
Description: This is a digitally signed message part