IP auto-config with DHCP on sparc64 possibly broken
Hi,
I'm currently working on creating a useful configuration for network
boot with GRUB2 on sparc64. But I'm experiencing problems during IP
auto-configuration. This is done by using the `ip=[...]` kernel command
line option and the `ipconfig` tool ([1]) included in the initramfs
which evaluates this option.
[1]:
https://git.kernel.org/pub/scm/libs/klibc/klibc.git/tree/usr/kinit/ipconfig/README.ipconfig
Whenever I try to use DHCP I see bus errors during operation:
```
[...]
Begin: Running /scripts/init-premount .[ 36.957993] RPC: Registered
named UNIX socket transport module.
.. do[ 37.033724] RPC: Registered udp transport module.
[ 37.094913] RPC: Registered tcp transport module.
Beg[ 37.155124] RPC: Registered tcp NFSv4.1 backchannel transport module.
in: Mounting root file system ... Begin: Running /scr[ 37.286986]
FS-Cache: Netfs 'nfs' registered for caching
ipts/nfs-top ... done.
Begin: Running /scripts/nfs-premount ... done.
IP-Config: enp1s1f1 hardware address 08:00:20:12:34:56 mtu 1500 DHCP
IP-Config: no response after 2 secs - giving up
IP-Config: enp1s1f1 hardware address 08:00:20:12:34:56 mtu 1500 DHCP
Bus error
[...]
IP-Config: enp1s1f1 hardware address 08:00:20:12:34:56 mtu 1500 DHCP
Bus error
IP-Config: enp1s1f1 hardware address 08:00:20:12:34:56[ 41.042475]
enp1s1f1: Link is up using
mtu[ 41.042483] internal
150[ 41.092568] transceiver at
0 DH[ 41.123683] 100Mb/s, Full Duplex.
CP
Bus error
[...
IP-Config: enp1s1f1 hardware address 08:00:20:12:34:56 mtu 1500 DHCP
Bus error
/init: .: line 230: can't open '/run/net-enp1s1f1.conf'
[ 41.534502] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x00000200
[...]
```
These are from an Ultra 10, but I also saw the same on a T5220 and a T5240.
The same happens, when running `ipconfig` manually from the initramfs:
```
[...]
No init found. Try passing init= bootarg.
BusyBox v1.27.2 (Debian 1:1.27.2-2) built-in shell (ash)
Enter 'help' for a list of built-in commands.
(initramfs) ipconfig :::::enp1s1f1:dhcp
IP-Config: enp1s1f1 hardware address 08:00:20:12:34:56 mtu 1500 DHCP
Bus error
```
Hence I currently do all needed configuration on sparc64 "manually" by
providing all required addresses in the `ip=[...]` option which at least
works but of course duplicates configuration. Other architectures
(ppc64, alpha, hppa) don't have this problem AFAICS.
e.g. on DS25:
```
[...]
Begin: Running /scripts/init-p[ 3.334959] tg3 0002:02:05.0: firmware:
direct-loading firmware tigon/tg3_tso.bin
remount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/nfs-top ...
done.
Begin: Runnin[ 3.439451] IPv6: ADDRCONF(NETDEV_UP): enP2p2s5: link is
not ready
g /scripts/nfs-premount ... done.
IP-Config: enP2p2s5 hardware address 00:16:35:12:34:56 mtu 1500 DHCP
[ 3.803708] random: crng init done
IP-Config: no response after 2 secs - giving up
IP-Config: enP2p2s5 hardware address 00:16:35:12:34:56 mtu 1500 DHCP
[ 7.153316] tg3 0002:02:05.0 enP2p2s5: Link is up at 1000 Mbps, full
duplex
[ 7.154293] tg3 0002:02:05.0 enP2p2s5: Flow control is on for TX and
on for RX
[ 7.155269] IPv6: ADDRCONF(NETDEV_CHANGE): enP2p2s5: link becomes ready
IP-Config: no response after 3 s[ 7.950191] ipconfig(196): unaligned
trap at 0000000120003868: 000000011ffcf3af 28 2
[ 8.058589] ipconfig(196): unaligned trap at 0000000120003868:
000000011ffcf3af 28 2
IP-Config: enP2p2s5 hardware address 00:16:35:12:34:56 mtu 1500 DHCP
IP-Config: enP2p2s5 guessed broadcast address 172.16.255.255
IP-Config: enP2p2s5 complete (dhcp from 172.16.0.1):
address: 172.16.2.89 broadcast: 172.16.255.255 netmask: 255.255.0.0
gateway: 172.16.0.1 dns0 : 172.16.0.1 dns1 : 0.0.0.0
host : ds25
domain : domain.tld
rootserver: 172.16.0.9 rootpath: /path/to/ds25
filename : /AC100259
done.
[...]
```
...everything works as expected.
As there were some problems with klibc on sparc64 end of 2017 I assume
that there might still be something wrong there which breaks IP
auto-configuration with DHCP on sparc64.
I only tried on a limited selection of machines, but as there were years
between the introduction of Ultra 10 and T5240 and both use different
NICs but show the same issues it's likely that this affects more or all
sparc64 machines.
You can reproduce this by reconfiguring and rebuilding your initramfs
(add `BOOT=nfs` to `/etc/initramfs-tools/initramfs.conf` and rebuild)
and adding `ip=:::::<INTERFACE_NAME>:dhcp` to your kernel command line.
The latter should work regardless of using SILO or GRUB2 and also when
the kernel is loaded from disk.
Before reboot, make sure you still have your original initramfs and a
boot loader configuration which boots with the original initramfs.
A running DHCP server is required. I use the ISC DHCP server and a host
configuration similar to the following:
```
host ultra-10 {
hardware ethernet 08:00:20:12:34:56;
fixed-address ultra-10.domain.tld;
next-server nfs.domain.tld;
filename "/boot/grub/sparc64-ieee1275/core.img";
option root-path "/path/to/ultra-10";
option host-name "ultra-10";
}
```
The remaining configuration is pretty standard and as it is working
perfectly for the other mentioned architectures (ppc64, alpha, hppa) I
do not assume the problem comes from the DHCP server configuration.
Any idea what could be wrong with `ipconfig` or how I can further debug
this?
Cheers,
Frank
Reply to: