--- Begin Message ---
Package: libc6
Version: 2.13-38+deb7u3
Severity: normal
Tags: upstream patch
This is really a problem with amd (am-utils), not the eglibc, but it's hard to solve on amd's side (see topic "NFS v2 RPC reply on LOOKUP" on the am-utils list) but can easily be hacked around on eglibc's side.
The phenomenon is an amd NFS mount (typically on user login) to stall for 5 or 10 seconds.
The root problem is that amd occasionally copies (the contents of) a SVCXPRT structure to store it away and be able to respond in the background. This is probably illegal, but "used to work" with the traditional SUN RPC implementation.
Now eglibc stores both an iovec and a msghdr structure in a private part of the SVCXPRT, with the embedded msgghdr's msg_iov field set to point at the corresponding embedded iovec. When the structure is copied, the embedded msghdr's msg_iov still points to the original SVCXPT's embedded iovec, not the one embedded in the copy. If the copy is then used to transmit a reply, the embedded iovec's length is set to the desired value, but sendmsg() actually uses the original SVCXPRT's value due to the msg_iov field of the msghdr embedded in the copy pointing at the iovec embedded in the original (which fields are not set to the desired values).
Then, sendmsg() transmits a reply of incorrect length and doesn't return with the expected value, which causes a second (error) reply being sent, confusing the client. The client then discard the reply and resends the request after a (five second) timeout. At that point, amd has probably finished the mount operation, doesn't background the request, replies correctly and everything works as expected.
The problem can obviously be hacked around by forcing the embedded msghdr's msg_iov field to point to the embedded iovec before passing the msghdr to sendmsg(), which the attached (one-line) patch does.
-- System Information:
Debian Release: 7.6
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 3.10.42.wap (SMP w/2 CPU cores)
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages libc6:amd64 depends on:
ii  libc-bin  2.13-38+deb7u3
ii  libgcc1   1:4.7.2-5
libc6:amd64 recommends no packages.
Versions of packages libc6:amd64 suggests:
ii  debconf [debconf-2.0]  1.5.49
pn  glibc-doc              <none>
ii  locales                2.13-38+deb7u3
-- debconf information excluded
Index: sunrpc/svc_udp.c
===================================================================
--- sunrpc/svc_udp.c	(revision 3768)
+++ sunrpc/svc_udp.c	(revision 3769)
@@ -329,6 +329,7 @@
 	  iovp = (struct iovec *) &xprt->xp_pad [0];
 	  iovp->iov_base = rpc_buffer (xprt);
 	  iovp->iov_len = slen;
+	  mesgp->msg_iov = iovp; /* hack around clients like amd that memcpy() a SVCXPRT structure */
 	  sent = __sendmsg (xprt->xp_sock, mesgp, 0);
 	}
       else
--- End Message ---
--- Begin Message ---
- To: Edgar Fuß <ef@math.uni-bonn.de>,	757474-done@bugs.debian.org
- Subject: Re: Bug#757474: libc6: amd copying a SVCXPRT structure leads to libc's RPC code sending packets of incorrect length
- From: Aurelien Jarno <aurelien@aurel32.net>
- Date: Sat, 4 Sep 2021 22:15:48 +0200
- Message-id: <YTPT9Pgd5SJpXzG5@aurel32.net>
- In-reply-to: <20140808152430.GA25286@math.uni-bonn.de>
- References: <20140808152430.GA25286@math.uni-bonn.de>
Version: 2.32-0experimental0
On 2014-08-08 17:24, Edgar Fuß wrote:
> Package: libc6
> Version: 2.13-38+deb7u3
> Severity: normal
> Tags: upstream patch
> 
> This is really a problem with amd (am-utils), not the eglibc, but it's hard to solve on amd's side (see topic "NFS v2 RPC reply on LOOKUP" on the am-utils list) but can easily be hacked around on eglibc's side.
> 
> The phenomenon is an amd NFS mount (typically on user login) to stall for 5 or 10 seconds.
> 
> The root problem is that amd occasionally copies (the contents of) a SVCXPRT structure to store it away and be able to respond in the background. This is probably illegal, but "used to work" with the traditional SUN RPC implementation.
> 
> Now eglibc stores both an iovec and a msghdr structure in a private part of the SVCXPRT, with the embedded msgghdr's msg_iov field set to point at the corresponding embedded iovec. When the structure is copied, the embedded msghdr's msg_iov still points to the original SVCXPT's embedded iovec, not the one embedded in the copy. If the copy is then used to transmit a reply, the embedded iovec's length is set to the desired value, but sendmsg() actually uses the original SVCXPRT's value due to the msg_iov field of the msghdr embedded in the copy pointing at the iovec embedded in the original (which fields are not set to the desired values).
> Then, sendmsg() transmits a reply of incorrect length and doesn't return with the expected value, which causes a second (error) reply being sent, confusing the client. The client then discard the reply and resends the request after a (five second) timeout. At that point, amd has probably finished the mount operation, doesn't background the request, replies correctly and everything works as expected.
> 
> The problem can obviously be hacked around by forcing the embedded msghdr's msg_iov field to point to the embedded iovec before passing the msghdr to sendmsg(), which the attached (one-line) patch does.
> 
SunRPC support has been removed from glibc 2.32. Closing the bug
accordingly.
Regards,
Aurelien
-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net
--- End Message ---