[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Compiling Linux with "bdver2" gcc optimization option



Bonjour,

Franco Martelli, on 2019-09-14:
> On 13/08/19 at 19:35, Étienne Mollier wrote:
[...]
> >                     I would do a few tests with a virtual
> > machine supporting bdver2 instructions before going live anyway,
> > and backups stored far away from the machine once testing, and
> > possibly without contact with that kernel.
>
> I didn't boot that kernel, I don't rely on it. Thanks if you can
> investigate on what happens during compilation process.

Woops, this sounds a bit like I might not have used a very clear
wording.  If I were at your place, I would proceed so; but I
don't have a Piledriver CPU to do actual testing on my side.
I'm still stuck with an old K10, not to mention my laptop, which
comes with an old regular Atom.  :)

I did try to replace the k8 option by amdfam10 though.  In the
half hundred thousand lines of logs issued by the build, I get
something like a dozen differences between k8 and k10.  There
were a tremendous amount of warnings too, but some of the ones
you encountered did not appear: the thing with the missing jump
target for instance, nor the ANNOTATE_NOSPEC_ALTERNATIVE on the
retpoline thing.  I am running Debian Sid, currently shipping
with Gcc 9, so this is a difference to take in account though.
Finally, building an upstream Linux 5.2 kernel instead of
Buster's 4.19 does not show most of the warnings I encountered,
as these are being fixed as they come, but probably not as well
in LTS kernels.

Doing a third run with addition of the tuning options (-mtune)
made almost no difference at all, except on the build number and
the CRC hash.  It seems to me that the architecture specific
(-march) option already applies the proper tuning, at least for
my architecture.

My last manipulation consisted in building Linux upstream 5.2.9,
released lately, with -march=amdfam10, and this one is running
quite well so far:

	$ uname -rv
	5.2.9-k10 #1 SMP PREEMPT Fri Aug 16 16:13:08 CEST 2019

But again, no messages worth mentioning during the compilation.

Do your warnings appear when your build targets k8?
Or when building a generic x86_64 kernel?


> > Note that someone from the Gentoo community has developed a set
> > of patches to expand the possibilities of optimization for the
> > kernel, depending on Linux and GCC versions.  You may be
> > interested in the following one for Buster:
> >
> > 	https://github.com/graysky2/kernel_gcc_patch/blob/master/enable_additional_cpu_optimizations_for_gcc_v8.1%2B_kernel_v4.13%2B.patch
> >
> > These mainly apply changes in various code sections to put the
> > flags in place, and provide options through the .config file of
> > the source code.  I haven't tested it, but I don't believe this
> > will solve your warnings, reading through the patch.  Yet it
> > does a bit more than just replacing the compiler flag: there is
> > notably a component related to L1 cache shift which is modified
> > too.  That should bring an appreciable performance boost if it
> > corrects cache line mismatch.
>
> Thanks, but I don't want to patch the kernel, that change to the
> Makefile was enough simple in order to get the optimization that I
> looking for.

Fair enough, I reread the whole patch, and your modification
seems sufficient, I believe.

> > Please be aware that CPU optimizations in kernel, targeting Zen
> > and Skylake in this case, seemed to be hardly detectable, or
> > even counter productive, with various computer usage patterns,
> > according to measures done by Phoronix earlier this year:
> >
> > 	https://www.phoronix.com/scan.php?page=article&item=linux-50-march&num=1
> >
> > Of course this may not be the case for your own typical load,
> > but I would recommend to do a few measures, to assess the actual
> > performance gain on your machine with, and without, CPU specific
> > compiler optimizations.
>
> I never experimented benchmark with and without bdver2 option, I assumed
> that if it exists an option for k8 in the kernel then changing it to
> bdver2 it would be good (I hope).

Compilers may have good optimization routines to boost the speed
of the code in several situations, but in other ones there are
trade-offs to take between size and performance of the code.  I
personally prefer smaller sized executables (-Os): they fit in
less pages, so uses less CPU cache, and leave more room for my
programs to get more of their own data in cache (or I might
simply have spent too much time on suckless.org.  ;)

Activating CPU specific options is interesting on some
particular use cases, but newer instruction often require
setting up various bits in the CPU before use, which tends to
inflate the resulting executable.  This may be interesting for
scientific applications, or programs dealing with big data
arrays in general.  In kernel mode however, the only case I can
think of where CPU specific accelerators would be beneficial are
disk ciphering and RAID arrays, for which I believe there is
already some runtime detection of available instructions, even
with the generic compiler options.

To be honest, I don't believe the performance gain to get from
the compiler is tremendous here.  Figures from the author of the
patch are there to tell us there is a gain indeed; but when you
investigate in detail the percentage of performance brought by
the tuning, it is only about 0.03% for the selected benchmark on
median values.  See the "Data" section at the very end of the
README, and do your own calculations:

	https://github.com/graysky2/kernel_gcc_patch/blob/master/README.md

The best you can do here is to do your own measures with your
own pattern of usage.  If you are a developer, you can run timed
builds of Linux, and see the time it takes.  If you are inclined
toward image rendering speeds, there are a few demo-scenes out
there where you might get a few figures such as the frame rate
(careful, glxgears may get capped to 60Hz when some accelerators
are in use, prefer fancier demos.  ;)

There is also this other thread dealing with kernel latency
measures; you may find a few useful tools listed in this
discussion:

	https://lists.debian.org/debian-user/2019/08/msg00851.html

Or just see how perform your usual programs, if there are
visible improvements.

Have fun,  :)
-- 
Étienne Mollier <etienne.mollier@mailoo.org>
              5ab1 4edf 63bb ccff 8b54  2fa9 59da 56fe fff3 882d


Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: