[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [PATCH 2/7] ppc64el: kernel: config: little-endian powerpc64 options



Hi Bastian,

Upfront, thanks for replying.

On 05/27/2014 04:47 PM, Bastian Blank wrote:
On Sun, May 25, 2014 at 06:52:27PM -0300, Mauricio Faria de Oliveira wrote:
- CRASH_DUMP is not enabled on ppc64. I don't know why, so didn't change it.

Do I really need to tell you that you can't come and work on exactly one
thing without looking left and right?  If you think you need it and it
is useful, enable it for all users.

Ok, ack.
I'll spin a test on a powerpc build to verify it's fine.

- PPC_64K_PAGES neither. I imagine it might not be the best setting
for most powerpc64 systems out there (powermacs, I believe), but it
is a /must/ for POWER servers and systems expected to run ppc64el
(currently POWER8-based).  Smaller page sizes (i.e., 4k) do incur a
significant performance hit (sorry, I don't know a number); I believe
it's the main reason the page size is 64k on every distro running
on POWER servers. And I think it's not an optimal setting for other
powerpc-based systems (powermacs, others).

Lets look into Kconfig:
| config PPC_64K_PAGES
|         bool "64k page size" if 44x || PPC_STD_MMU_64 || PPC_BOOK3E_64

Yes, but it's still required to explicitly set PPC_64K_PAGES to y.

That excerpt only says the option is _available_ if that condition is
met. To have it _enabled_, then I guess that would be something like
'def_bool y if', or 'depends on' and 'default y'.

These excerpts from 'yes "" | make [...] oldconfig' (as issued by the
source package) and the resulting .config show it:

"""
Page size
> 1. 4k page size (PPC_4K_PAGES) (NEW)
  2. 64k page size (PPC_64K_PAGES) (NEW)
choice[1-2?]:
"""

"""
CONFIG_PPC_4K_PAGES=y
# CONFIG_PPC_64K_PAGES is not set
"""

So, I'd need to keep that one. Ok?


Perhaps if a new ppc64 flavor is created for holding the PPC_BOOK3S_64
options, we can put it (and other options depending on it) there, and
then include it in the ppc64el defines.  I'd be happy to work on that.

There are only two types of ppc64 processors supported in linux:
| config PPC_BOOK3S_64
|         bool "Server processors"
| config PPC_BOOK3E_64
|         bool "Embedded processors"

There is not much choice.  The first option applies to all server and
desktop processors.  So I would assume BOOK3S_64 is enabled implicit
during build, but I haven't checked.

Yes, you're right. I'll remove it.

PPC_BOOK3S_64 is enabled at 'make oldconfig' too.
(and only there; no option at any Kconfig file 'select's it.)

"""
Processor Type
> 1. Server processors (PPC_BOOK3S_64) (NEW)
  2. Embedded processors (PPC_BOOK3E_64) (NEW)
choice[1-2?]:
"""


+CONFIG_VSX=y
This one should be OK for other powerpc processors, but I understand
it adds kernel code/overhead, so I chose not to put it in ppc64.

This is a pre-condition for transactional memory.

Yes, I missed that one; thanks. I'll remove it.


+CONFIG_NR_CPUS=2048
This should be OK too, but not sure it incurs noticeable overhead.

It does.  x86 only allows > 512 cpus with variable length cpumasks
(CPUMASK_OFFSTACK).  Why does powerpc not use this?

Hm, that I don't know.  I can ask the kernel folks and get back to you.

What is the current planned cpu density?

From as few vCPUs as desired, as guest OS, to all CPUs available, as
host OS.

For the publicly announced systems [1], 'all' ranges w/in 80-192 CPUs
-- 1 or 2 processor module(s), 10 or 12 cores per module, 8 hardware threads (logical CPUs) per core (POWER8 specs). [2]

For the to-be announced systems, sorry; but sure enough there's reason
for NR_CPUS=2048 in pseries_{,le_}defconfig, and for patches like [3].

So, I'd ask you to keep NR_CPUS=2048, as in the defconfigs.

If that doesn't cause an ABI break (hm..), then I'd ask to keep it
at least as 256, to fully support the currently announced hardware.
(it will probably need to be changed later.)

Regarding an ABI break, that I'm not sure of.. but other than the
many variables/structures sized after NR_CPUS, there's this cpp
macro making into linux-headers-<version>-powerpc64le:

  #define NR_CPUS_BITS 11 /* ilog2(CONFIG_NR_CPUS)      # */


So, ok to keep NR_CPUS=2048?


Unfortunately it's not marked as 'depends on' !CPU_LITTLE_ENDIAN yet;
that would save us from that particular config line.

Can you please send patches for this upstream?

Oh, sure. That's on my queue.

We'd still need to carry that option for the time being, please.


+CONFIG_PPC_POWERNV=y
Does not depend on CPU_LITTLE_ENDIAN.
Yes; this is again the case of 'selects' stuff not available on other
powerpc processors/platforms (e.g., POWER7 nap, EPAPR boot).

Default "y" and depends on BOOK3S_64, so most likely activated already.

Yes, right. I'll remove it.

I happened to notice that earlier, but thought that was just OK to
explicitly declare platforms for some reason, as there are similar
cases for PPC_PMAC and PPC_PSERIES (kernelarch-powerpc/config
{,-arch-64}), which 'depend on' enabled stuff and are 'default y'.

I can remove those 2 in the next submission, too, if that's ok.
(trying to look left and right.)

+## choice: Timer frequency
+CONFIG_HZ_100=y
+# CONFIG_HZ_250 is not set
+# CONFIG_HZ_300 is not set
+# CONFIG_HZ_1000 is not set
+## end choice
No.
Your point is the same as for CPU_FREQ_DEFAULT_GOV above (for which I
asked clarification), or there's something else?

We have set it to a different value in the main config.  You have to
explain why this value is not appropriate for this flavour.

Ok, below.

Also most architectures now use several types of NO_HZ, so the main
frequency is not really important anymore.

With that, I disagree.

The application of NO_HZ is only idle CPUs (given it's description,
tickless /idle/, and the rename to NO_HZ_IDLE with the introduction of
NO_HZ_FULL):

"""
CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
periodic tick when a CPU enters idle mode.  [4]
"""

The target scenario for the systems running ppc64el is under-load,
not that idle.  There, the lower interrupt rate helps.

Also, again there's good reason for the defconfigs to have HZ=100,
and that to be kept as is by many distros out there.

I wouldn't know all the reasons, but that's been thought by talented
people, for this platform. So I'd ask you to keep HZ=100 on Debian too.

Is that ok?

I can look for more evidence/explanations if you'd like.




Best regards,


[1] http://www-03.ibm.com/systems/power/announcement.html
[2] http://www-03.ibm.com/systems/power/hardware/s812l-s822l/specs.html
[3] http://kernel.opensuse.org/cgit/kernel-source/commit/?id=ddaa9788a2490b2e4f2ba7ea82fc018c1c52329f
[4] http://lwn.net/Articles/549592/

--
Mauricio Faria de Oliveira
IBM Linux Technology Center


Reply to: