[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

idmapd fails in dracut - nfsroot over nfsv4 with wrong UID



Hello, debianics,

it sounds like an age old problem solved quite a lot of times, but I still 
can't get it to work.

I try to set up a diskless cluster in beowulf style with nfsroot.
Base is debian wheezy.

short story:
when I boot into nfsv4 root, I find weird file ownership 

drwxr-xr-x   2 4294967294 4294967294 4096 Feb 15 13:00 bin
drwxr-xr-x   3 4294967294 4294967294 4096 Feb 15 15:15 boot
drwxr-xr-x  17 root       root       3240 Feb 15 15:38 dev
drwxr-xr-x 115 4294967294 4294967294 4096 Feb 15 14:27 etc
drwxr-xr-x   2 4294967294 4294967294 4096 Dec 24 13:41 home

indicating that idmap demon is not working as it should.

The only workaraound to get rid of the problem is to mount nfsroot as nfsv3 
instad of nfsv4 - see here :
http://www.linuxquestions.org/questions/linux-networking-3/does-pxe-booting-nfs-root-supports-nfsv4-925154/#post4583418
and here
http://serverfault.com/questions/379486/netboot-debian-wheezy-from-nfs-v4

The symptoms are well known and reported repeatedly.
However, I checked following underlying causes - none was reflecting my 
Situation:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=724514
http://layer-acht.org/fai-irc/fai.log.20120613
https://kernel.googlesource.com/pub/scm/boot/dracut/dracut/+/3eca0cc846e89675949abb11e9606f3222a2e266%5E%5E!/
https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=537969
https://bugzilla.redhat.com/show_bug.cgi?id=922031#c5

I updated dracut to 040-207-g7252cde from testing
- same picture -

I have to use dracut instead of initramfs, since my cluser uses bonded 
Ethernet links which I could not get working with standard initfs.

The plan is to layer common ro installation and nodewise rw-dirs using aufs 
and exporting them individually per node.

Server has dnsmasq, (providing DHCP, DNS and TFTP) and nfs-kernel-server.

My first setup started from a HD install. After switching from readonly to 
writable aufs, everything stalled and I blamed aufs as dmesg recommended...
http://sourceforge.net/p/aufs/mailman/message/33392409/

I got rid of this problem by switching the server back form testing 3.16 
kernel to standard 3.2 from wheezy. Don't ask me why...

This way II messed up this HD based installation, so I decided to try a new 
clean one based on the debootstrap tool following this pointer
https://help.ubuntu.com/community/Installation/OnNFSDrive
- same picture -

following
https://dracut.wiki.kernel.org/index.php/Main_Page
I tried both the deprecated command syntax 
root=/dev/nfs nfsroot=
and the recommended one:
root=nfs4:[<server-ip>:]<root-dir>[:<nfs-options>]
rd.nfs.domain=<NFSv4 domain name>
basically no difference

I tried to start rpc.idmapd and stad manually from the console, which is 
possible either from a nfsv3 root or by copying the nobody-owned system into 
ramdisk and chown them to root. Then I can see error messages like 

rpc.statd: Failed to create /var/lib/nfs/state.new: Read-only file system
I configured dracut manually to include this path - no help.

When I get everything right to fire up idmapd manually, I can mount my nfsv4 
export with proper UID.  So, name resolution should be OK.
I even could manage a manual switch_root (over ssh login...), but did not 
yield a decent running system this way (there is more stuff done during init 
I'm afraid, like /proc, /dev, /sys....). But it tells me that server and 
network issues are OK. It's a initram problem.

I can capture long logs of the init process. As far as my understanding goes, 
dracut tries to kill both statd and idmapd at 99-nfsroot-cleanup.
I think it shuold not do so anayway, but it even looks like it cannot get a 
PID for idmapd, so I suppose it could not even succeed in getting it up and 
running properly before.

Basically I think the problem is located somewhere between the integration of 
dracut, nfsv4/idmapd and debian packaging scheme.
see also
http://marc.info/?l=linux-nfs&m=121621383812750&w=2
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=737554

I hoped that systemd might help out, but could not get it properly installed 
into the debootstrap base, see here:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=668001

Do I just miss some silly detail?
Or is debian & dracut & nfsv4-root simply no valid setup yet?

I can provide MBytes of logs and screenshots, and with wireshark, we may be 
easily multiplying this figure....

Anybody out there to collaborate in a solution?

yours
Wolfgang Rosner


Reply to: