[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [PATCH 2/7] ppc64el: kernel: config: little-endian powerpc64 options



Hi Bastian,

Thanks for your insightful comments.  Let me try to expose the rationale
I had in mind for putting in the config file the options you commented.

The general rationale is not to be intrusive in the ppc64 port's config:

- If an option /differs/ between ppc64 and ppc64el, I put either in the
  specific -be and -le file.
  I didn't change it in the ppc64 config if I didn't know why it was set
  there in that way (which happens to be the case for all of them :-).
  So, I just humbly decided not to touch what I don't understand. I'd be
  happy to submit changes if they're acknowledged by experienced people.

- If an option /does not exist/ in the ppc64 config (mainly options for
  the platforms we have for little-endian powerpc64 nowadays), I chose
  not to put it there if:
  - if it could penalize performance or even functionality (say,
    cpu-specific tuning; kernel overhead)
  - if it's specific to the currently available powerpc64le platforms
    (which are PPC_BOOK3S_64).
    Again, not to be intrusive in the ppc64 config, adding stuff;
    but I'd be OK to add those if acknowledged by maintainers).

Now let me go through each comment below, with that rationale in mind.


On 05/25/2014 08:31 AM, Bastian Blank wrote:
On Sat, May 24, 2014 at 06:18:57PM -0300, Mauricio Faria de Oliveira wrote:
--- /dev/null
+++ b/debian/config/kernelarch-powerpc/config-arch-64-le
@@ -0,0 +1,74 @@
+##
+## file: arch/powerpc/Kconfig
+##
+CONFIG_CRASH_DUMP=y
+CONFIG_PPC_64K_PAGES=y

- CRASH_DUMP is not enabled on ppc64. I don't know why, so didn't change it.
- PPC_64K_PAGES neither. I imagine it might not be the best setting
for most powerpc64 systems out there (powermacs, I believe), but it
is a /must/ for POWER servers and systems expected to run ppc64el
(currently POWER8-based).  Smaller page sizes (i.e., 4k) do incur a
significant performance hit (sorry, I don't know a number); I believe
it's the main reason the page size is 64k on every distro running
on POWER servers. And I think it's not an optimal setting for other
powerpc-based systems (powermacs, others).

  So, since that one seems required to be different among ppc64/ppc64el
I did so.

+CONFIG_PPC_TRANSACTIONAL_MEM=y

This one because it depends on PPC_BOOK3S_64, which can't be enabled
on the ppc64 port (at least on a 'general' flavor, which is the case
currently), since it 'selects' many processors options not available
on some powerpc processors (I think Performance Monitoring Unit is
the main one).

Perhaps if a new ppc64 flavor is created for holding the PPC_BOOK3S_64
options, we can put it (and other options depending on it) there, and
then include it in the ppc64el defines.  I'd be happy to work on that.

Nothing of this is le specific.

Agreed, but set in that way for the reasons explained above.

+##
+## file: arch/powerpc/platforms/Kconfig.cputype
+##
+CONFIG_CPU_LITTLE_ENDIAN=y

This is.

+CONFIG_PPC_BOOK3S_64=y

As discussed above.

+CONFIG_POWER7_CPU=y

Similarly, depends on PPC_BOOK3S_64, and also incurs performance hit
(or even break on other processors) due to tuning to a particular CPU.

Just for the record (I'm sure you understand this), an excerpt from
Kconfig.cputype:

"""
This will create a kernel which is optimised for a particular CPU.
The resulting kernel may not run on other CPUs, so use this with care.
"""

+CONFIG_VSX=y

This one should be OK for other powerpc processors, but I understand
it adds kernel code/overhead, so I chose not to put it in ppc64.

From Kconfig.cputype:

"""
          This option is only useful if you have a processor that supports
          VSX (P7 and above), but does not have any affect on a non-VSX
          CPUs (it does, however add code to the kernel).
"""

+CONFIG_NR_CPUS=2048

This should be OK too, but not sure it incurs noticeable overhead.

I see this bitmap on kernel/cpu.c, whose size depends on NR_CPUS.
(didn't find much else depending on the NR_CPUS definition).

const unsigned long cpu_bit_bitmap[BITS_PER_LONG+1][BITS_TO_LONGS(NR_CPUS)]

That bitmap might go around over the system.. I wouldn't know
exactly its impact; so I chose not to change the value in ppc64.


This not.

Reasons explained above.


+##
+## file: arch/powerpc/platforms/powermac/Kconfig
+##
+#. This must be explicitly disabled (it's enabled by default).
+# CONFIG_PPC_PMAC is not set

Please explain?  If it does not work, it needs a "depends on
!CPU_LITTLE_ENDIAN".

Yes, I agree.  I'm not sure why it is not set that way (maybe oversight)
but that also happens to be the case with other Cell, PS3 and other
platforms known not to support taking interrupts in little endian mode
(ILE) as they are currently, therefore unable to run ppc64el as-is.

The main reason I explicitly disable PPC_PMAC is that it helpfully
disables a ton of stuff that (thankfully) 'depends on' it (say drivers,
windfarm, cpufreq and others), saving many lines from being moved
from ppc64 config to ppc64-be -- keeping things simple.

Unfortunately it's not marked as 'depends on' !CPU_LITTLE_ENDIAN yet;
that would save us from that particular config line.


+##
+## file: arch/powerpc/platforms/powernv/Kconfig
+##
+CONFIG_PPC_POWERNV=y

Does not depend on CPU_LITTLE_ENDIAN.

Yes; this is again the case of 'selects' stuff not available on other
powerpc processors/platforms (e.g., POWER7 nap, EPAPR boot).

Another candidate for a powerpc64 'server' flavour, suggested above.


+##
+## file: arch/powerpc/platforms/pseries/Kconfig
+##
+CONFIG_LPARCFG=y
+CONFIG_PPC_SPLPAR=y
+CONFIG_PPC_SMLPAR=y
+CONFIG_DTL=y

Explain?

Those are for eventually running powerpc64le on LPARs/PowerVM. There are
a few bits going upstream [1] for that.

So I'm just enabling:
- what some software would expect:
  - LPARCFG for /proc/lparcfg
  - DTL for tools/analysis)
plus ability to use common features in that world
  - shared processor lpar -- oops, that's in the ppc64 config;
   - shared memory lpar -- this isn't there, but I dont know why.
     so, I can put it there if that's OK.


+##
+## file: drivers/cpufreq/Kconfig
+##
+## choice: Default CPUFreq governor
+# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
+# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
+# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
+CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
+# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
+## end choice

No.  You have to specify it for all similar.

This one I didn't understand. May you lease clarify the meaning of
'all similar'?

Or is it just a structural change you're asking for? (I've seen
'choice:' sections like that on other configs).

In case it means 'all powerpc64 similar systems', I'd explain the
same case of the 4k/64k page size, this being a usually-desired
setting for the server processors/systems (the default governor
is even set to ondemand on the defconfig for pseries_le).

If the point is another, I'd have to better understand your point,
please.


+##
+## file: drivers/net/ethernet/ibm/Kconfig
+##
+#. qemu-kvm with kernel only (no initrd).
+CONFIG_IBMVETH=y

qemu supports initrd.

Oh, sure. I was too lazy to write a better comment, but the reason
here is that specified in the patch header: just a convenience not
to generate an initrd when we don't have a system yet (say,
debootstrap --second-stage), and then booting it without using an
initrd.

That means, not to have to extract the deb package, and picking the
ibmvscsi, ibmveth, blk_dev_sd, ext4_fs modules, plus shell, mount,
insmod, other binaries; then creating the init script).

Really, just a convenience. We can ship/tell people how to create
a simple initrd for debootstrapping from other architectures.
The thing with ppc64el is that it's usually not available for most
people to start on, so that debootstrap --second-stage (for running
in qemu-kvm, at least) is quite a common scenario nowadays.


+##
+## file: kernel/Kconfig.hz
+##
+## choice: Timer frequency
+CONFIG_HZ_100=y
+# CONFIG_HZ_250 is not set
+# CONFIG_HZ_300 is not set
+# CONFIG_HZ_1000 is not set
+## end choice

No.

Your point is the same as for CPU_FREQ_DEFAULT_GOV above (for which I
asked clarification), or there's something else?

I'd also mark this one as the 'server' flavour thing. That HZ setting
is more appropriate for the platforms currently available for running
ppc64el.  When we eventually start to see embedded systems w/ that, or
systems where the recommended setting would be different, I believe we
can move that to other config files.)  The HZ=100 setting is adopted
by most of the distros running on ppc64/ppc64el servers nowadays.



Again, thanks for your comments. I hope the clarifications above help
to understand why the options are set in that way. I'd be happy to
change the structure/other config to whatever is recommended by the
experienced guys there  for keeping the settings optimal for each port.

Best regards,

[1] https://git.kernel.org/cgit/linux/kernel/git/benh/powerpc.git/commit/?h=next&id=983d8a6dda1d477f3ffa23a04cc2fa4d66fd93d1


--
Mauricio Faria de Oliveira
IBM Linux Technology Center


Reply to: