[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Booting the kernel on very large NUMA systems



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 12/29/2013 10:32 PM, Ben Hutchings wrote:
> The earlyprintk kernel parameter may be useful.

Ok, progress!

I installed SuSE Linux Enterprise Server 11SP3 according to the
instructions from SGI's knowledge base [1,2].

The installation is always supposed to be performed in "Single Node
Mode" and the KDump Memory (crashkernel parameter) is supposed to
be set to 512 MB.

The SLES kernel boots, even though it takes really long (about 10-20
minutes) until init actually starts. SLES11SP3 uses elilo instead of
GRUB.

So, since we don't have a support license and we prefer Debian over
SLES anyway, I was doing some more tests now the machine is up and
running with SLES.

The first thing I did was copying the Debian 3.12 kernel over from
my Debian box including the initrd and having it boot instead of
the SLES11SP3 and to my surprise, it actually boots! So, it seems
GRUB is broken when booting on these large machines.

Unfortunately, I am currently seeing different problems:

[    9.346309] smpboot: Booting Node   0, Processors  #   1 #   2 #
3 #   4 #   5 #   6 #   7 OK
[    9.471821] smpboot: Booting Node   1, Processors  #   8 #   9 #
10 #  11 #  12 #  13 #  14 #  15 OK
[    9.706747] smpboot: Booting Node   2, Processors  #  16
[   14.695313] smpboot: CPU16: Not responding
[   14.703534]  #  17
[   19.688098] smpboot: CPU17: Not responding
[   19.694733]  #  18
[   24.679292] smpboot: CPU18: Not responding
[   24.685863]  #  19
[   29.670412] smpboot: CPU19: Not responding
[   29.676950]  #  20
[   34.661506] smpboot: CPU20: Not responding
[   34.668047]  #  21
[   39.652602] smpboot: CPU21: Not responding
[   39.659131]  #  22
[   44.643671] smpboot: CPU22: Not responding
[   44.650229]  #  23
[   49.634767] smpboot: CPU23: Not responding
[   49.641995]  OK
[   49.644052] smpboot: Booting Node   3, Processors  #  24

So the kernel isn't able to boot any CPU besides the onces on the
primary node. I will have to do further testing, but at least I made
some progress. I will go back to booting the SLES kernel, just to
make sure someone didn't rip out the NUMA-Link cables in the
server room.

Adrian

> [1]
http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=linux&db=bks&srch=&fname=/SGI_Admin/UV_Install_AG/sgi_html/ch01.html
> [2]
http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=linux&db=bks&srch=&fname=/SGI_Admin/UV_Install_AG/sgi_html/ch02.html

- -- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Icedove - http://www.enigmail.net/

iQIcBAEBCAAGBQJSzs3JAAoJEHQmOzf1tfkTLgMP/RQOaFOtAy0KHP+t3PVHoEzc
91ML8yAkb0lisky4bTqJwa69JN/P/f/0EPXIZQYEq0OHek5hB/MWSn8GQewTjJd0
YGa95J5SwzWqBT9UPF7wjfzDHBEp2iYmUdpM6IV7dzLnl4ylZmkiDOSQtALGzNcP
jpVpyyebax4eq9CtzjkJqG450hi5MPgAjgSilYEPUiVe7I7qEbDkVFc40L2fU+qQ
kaWoy2D2sIhyJorHVg7sJnrpJbTXfSJRtI+n2xE71Q3kXH2+BFcm0REzdfDP1EBk
BJoFTkEi3EdFxEL0+bIM3T+sReyPWzH8klfBUqYFbbyDkVAbutkaUDlg5Nf2ITDH
WOQ7Vf7G83/t13yjVhlD3ptHMaEGNbLvuNHr/F0ruMYshIKWXQJDm4FK+3Y86GfP
gDSZfr6WLM4A6DUurviIggk48oCt+rAxHT1y9XQwC59DieIyEYyGPLsBxjtpFoTj
kQTYx71CFM6ysHmFzB+RZ01L7ctQryVcaNA4k+F3QWejYksNwm4aD6ZocU2Yfecz
a07sNxDPIbqAUQaRMH515bhGPxqJ4sDOw8sf2MBcpp2SP5cEJfbolhwrPBwwgq02
5p7kNWHzlLjtCdajCKlIRje5StFZsFZxDvF2lxsCtlBEBgmhf5IjaV1BntyDibJO
Fy/iqH830mrXPYsLRnw/
=Gjq0
-----END PGP SIGNATURE-----


Reply to: