Re: Link-time optimization in debian packages
On Sun, Jun 05, 2011 at 01:18:43PM +0200, Emil Langrock wrote:
> I played around a little bit with GCC's LTO . It is really impressive for
> this kind of applications. I had a size reduction and speed increase with the
> tested applications. Of course, it was just a small testset and not really
> Link time-optimization exchanges the meaning of flags slightly . It is
> currently necessary to provide the optimization related flags from
> CFLAGS/CXXFLAGS also in LDFLAGS. Otherwise the LTO will not really to a
> optimization step.
> My question is now whether there are already plans to use LTO in Debian
> packages, any big debian related studies, policies, release goals, ...?
I'm afraid that it's not as simple. Every package has to be changed on
their own. For a systemic solution it might be better to talk to autotools
folks and their competition.
What needs to be changed:
* as you said, optimization flags need to be added to LDFLAGS as well
* the invocation of gcc (at least the one called for link) has to be
prefixed with "+", and you need to add "-flto=jobserver" to the above.
Otherwise, that massive link step will be done using only one thread.
There's a cost of greatly increased memory usage, although not above what a
typical parallel compilation would take. You just lose the option of doing
many one-threaded builds in parallel unless your memory is insane. This
affects only Debian buildds rather than actual developers, though.
Even worse, for some strange reason, GCC folks decided to do the compilation
_twice_. The -c invocation will go through the front-end and store the
gimple tree in the .o file, but then it will proceed with useless
compilation and add actual code into .o as well. During the link step, the
code is thrown away and the gimple trees are compiled again.
As far as I know, the only rationale for that is so if you forget to specify
-flto during link you still end up with a slow but working executable.
IMHO, it'd be so much better to throw a fatal error in that case: if the
user asked for LTO, he should be notified that something is wrong. No
backward compatibility is lost since old code won't have -flto.
With that double compilation misfeature, build time is roughly doubled:
make -j6 +gcc -O3
make -j6 +gcc -O3 -flto=jobserver
Speed gains for compiled executables are great, though: around 20%.
> I already found some smaller problems related to weird asm usage in some pic
> library code , but I would doubt that this is a big show blocker and will
> be fixed soon(tm).
There are some bugs, too. LTO was utterly useless in gcc-4.5, throwing an
ICE for anything slightly more complex than "hello world". It works well in
gcc-4.6, good enough for production usage as long as you're prepared to deal
with the occasional bug.
> It could also be interesting for large projects like Iceweasel,
> LibreOffice, ...
If the buildds can handle them. It is sad that architectures that care
about code speed and size the most would be unable to use LTO builds because
of 1-core Pentium3-equivalent speed buildds with 256MB ram when there's a
349823492357-way amd64 machine standing idle next to them. An amd64->armel
build takes as long as an amd64->amd64 one, while armel->armel is a matter
of eight hours on the code above without LTO and would keep swapping until
the heat death of the machine with LTO.
Thus, before enabling LTO on anything large, you'd have to ensure the
buildds have enough ram to handle that...
. As proven by exhaustive research by testing on a random 16 kLOC C
project, a 330 kLOC C++ one and an 8 kLOC C library.
. It depends on the code in question, of course. Something with a good
locality of calls will see no gain, something that jumps between different
source files in an intense area can see more. Not needing to cram
everything into a single file can increase readability...
1KB // Microsoft corollary to Hanlon's razor:
// Never attribute to stupidity what can be
// adequately explained by malice.