Bug#503650: linux-image-2.6.24-etchnhalf.1-amd64: epoll_wait returns bogus readyness notifications
Package: linux-image-2.6.24-etchnhalf.1-amd64
Version: 2.6.24-6~etchnhalf.4
Severity: normal
epoll_wait sometimes returns spurious readyness notifications: when a
file descriptor is closed and a new one with the same number is created
and added to the epoll set, epoll_wait sometimes returns a readyness
notification for the previous fd:
connect(11, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("129.42.56.216")}, 16) = -1 EINPROGRESS (Operation now in progress)
epoll_ctl(4, EPOLL_CTL_MOD, 11, {EPOLLOUT, {u32=11, u64=11}}) = -1 ENOENT (No such file or directory)
epoll_ctl(4, EPOLL_CTL_ADD, 11, {EPOLLOUT, {u32=11, u64=11}}) = 0
epoll_wait(4, {{EPOLLIN, {u32=10, u64=10}}, {EPOLLHUP, {u32=11, u64=11}}}, 64, 59743) = 2
epoll_ctl(4, EPOLL_CTL_MOD, 10, {EPOLLOUT, {u32=10, u64=10}}) = 0
epoll_ctl(4, EPOLL_CTL_MOD, 11, {EPOLLOUT, {u32=11, u64=11}}) = 0
getpeername(11, 0xe33390, [63018818183627008]) = -1 ENOTCONN (Transport endpoint is not connected)
write(2, "a Transport endpoint is not conne"..., 90a Transport endpoint is not connected at /opt/perl/lib/perl5/AnyEvent/Socket.pm line 770.
) = 90
fcntl(11, F_SETFL, O_RDONLY) = 0
read(11, 0xe51af0, 1) = ? ERESTARTSYS (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
write(5, "\1\0\0\0\0\0\0\0"..., 8) = 8
rt_sigreturn(0x5) = 0
read(11, <unfinished ...>
note how the MOD fails, indicating that the fd is not yet in the set and
now getpeername sys the socket is not (yet) connected while the following
read blocks (because the socket did NOT yte receive a HUP, as indicated by
epoll_wait).
How do I know the epoll_wait u64 data really refers to fd 11 in the above
example? The event library in question is libev, which uses epoll_ctl in only one place:
ev.data.u64 = fd; /* use u64 to fully initialise the struct, for nicer strace etc. */
if (expect_true (!epoll_ctl (backend_fd, oev ? EPOLL_CTL_MOD : EPOLL_CTL_ADD, fd, &ev)))
So libev ALWAYS registers interest in an fd with the u64 data set to the fd itself.
I modified libev to include a generation counter:
ev.data.u64 |= (long long)debug_gencount++ << 32;//D
And an strace of the same issue no looks like this:
connect(9, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("129.42.56.216")}, 16) = -1 EINPROGRESS (Operation now in progress)
epoll_ctl(4, EPOLL_CTL_ADD, 9, {EPOLLOUT, {u32=9, u64=26787711025161}}) = 0
epoll_wait(4, {{EPOLLHUP, {u32=9, u64=26766236188681}}, {EPOLLIN, {u32=10, u64=26749056319498}}, {EPOLLHUP, {u32=12, u64=26774826123276}}}, 64, 59743) = 3
epoll_ctl(4, EPOLL_CTL_MOD, 9, {EPOLLOUT, {u32=9, u64=26766236188681}}) = 0
epoll_ctl(4, EPOLL_CTL_DEL, 10, {0, {u32=10, u64=26749056319498}}) = -1 EBADF (Bad file descriptor)
epoll_ctl(4, EPOLL_CTL_MOD, 12, {EPOLLIN, {u32=12, u64=26774826123276}}) = 0
read(12, ""..., 65536) = 0
close(12) = 0
wait4(4920, 0x7fff048ec4dc, 0, NULL) = -1 ECHILD (No child processes)
getpeername(9, 0x2140ee0, [137077042347770112]) = -1 ENOTCONN (Transport endpoint is not connected)
write(2, "a Transport endpoint is not conne"..., 90a Transport endpoint is not connected at /opt/perl/lib/perl5/AnyEvent/Socket.pm line 770.
) = 90
fcntl(9, F_SETFL, O_RDONLY) = 0
read(9, 0xd01df0, 1) = ? ERESTARTSYS (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
write(5, "\1\0\0\0\0\0\0\0"..., 8) = 8
rt_sigreturn(0x5) = 0
read(9,
As you can see, epoll_ctl_add uses 26787711025161 (gencount 0x185d in the
higher 32 bits, fd 9 in the lower), but epoll_wait returns an event with
u64 set to 26766236188681 (gencount 0x1858, from an an earlier epoll_add,
and fd 9).
This proves that epoll_wait sometimes returns events for fd's not
currently registered in the epoll set.
This bug occurs only rarely, only under load, and is somewhat hard to
reproduce so I can't give a small example program. The analysis here is
probably enough to hunt down and fix this bug, however.
When a socket gets closed, then epoll *must* also remove all pending
notifications for that fd when removing the fd from the set.
-- Package-specific info:
** Version:
Linux version 2.6.24-etchnhalf.1-amd64 (Debian 2.6.24-6~etchnhalf.4) (dannf@debian.org) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Mon Jul 21 10:36:02 UTC 2008
** Tainted: P (1)
-- System Information:
Debian Release: lenny/sid
APT prefers unstable
APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.24-etchnhalf.1-amd64 (SMP w/4 CPU cores)
Locale: LANG=C, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Versions of packages linux-image-2.6.24-etchnhalf.1-amd64 depends on:
ii debconf [debconf-2.0] 1.5.11etch1 Debian configuration management sy
ii initramfs-tools [linux-initr 0.92b tools for generating an initramfs
ii module-init-tools 3.3-pre4-2 tools for managing Linux kernel mo
linux-image-2.6.24-etchnhalf.1-amd64 recommends no packages.
Versions of packages linux-image-2.6.24-etchnhalf.1-amd64 suggests:
ii grub 0.97-27 GRand Unified Bootloader
ii lilo 1:22.8-4 LInux LOader - The Classic OS load
pn linux-doc-2.6.24 <none> (no description available)
-- debconf information excluded
Reply to: