[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#776192: Linux null-pointer deref in 3.16.7-ctk2-1 (was: Bug#776192: upgrade-reports wheezy to jessie boot problem)



clone 776192 -1
reassign -1 systemd 215-17
fixed -1 217-1
tags -1 = patch
severity -1 important
owner -1 !
thanks

On Mon, Apr 06, 2015 at 08:50:58PM +0100, Ben Hutchings wrote:
> It looks the same as this problem:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705
> http://thread.gmane.org/gmane.linux.ubuntu.devel.kernel.general/39123/

I just encountered this bug while trying to install jessie on a Dell
PowerEdge R610 with a SAS 6/iR (fairly recent, much more than 1950s).
The kernel crashes while in d-i and installation fails. I also tried
with a nightly d-i with Linux 4.0 -- same issue.

Ironically, I found this bug report, clicked through the referenced
links, only to discover I had previously investigated this when
installling a similar server with Ubuntu 14.04 and I've even replied to
the Launchpad bug above... I can confirm it's the exact same bug. Note
that it was also covered by LWN(!): https://lwn.net/Articles/611226/

It's disappointing that this bug hasn't been fixed yet upstream and
especially the part where mptsas' error handling is broken and the
kernel crashes instead of gracefully failing. This is a different,
secondary, bug that is just triggered by the timeout.

In any case, there seems to have been /some/ improvement upstream on
this. systemd has increased the timeout from 30s to 60s (2e92633) and
subsequently to 180s (b5338a1), in commits that are both included in
v217. They have also made this a kernel command-line option
(udev.event-timeout & rd.udev.event-timeout) but those are more invasive
patches.

My working servers with Ubuntu 12.04 & 14.04 indicate on their dmesg
that the probe time is somewhere between 18-31s, so 180s would
definitely fix the effect of this bug.

The commits above aren't directly backportable to v215 as the upstream
code has changed significantly but the very simple patch attached is the
equivalent fix for v215 (it's untested, though).

This affects a large number of Dell systems (~100 alone in my case) and
there is no practical workaround, so it'd be great if this was fixed in
a jessie point release.

Best,
Faidon
diff --git a/src/udev/udevd.c b/src/udev/udevd.c
index a45d324..072499c 100644
--- a/src/udev/udevd.c
+++ b/src/udev/udevd.c
@@ -1415,7 +1415,7 @@ int main(int argc, char *argv[])
                                 if (worker->state != WORKER_RUNNING)
                                         continue;
 
-                                if ((now(CLOCK_MONOTONIC) - worker->event_start_usec) > 30 * USEC_PER_SEC) {
+                                if ((now(CLOCK_MONOTONIC) - worker->event_start_usec) > 180 * USEC_PER_SEC) {
                                         log_error("worker [%u] %s timeout; kill it", worker->pid,
                                             worker->event ? worker->event->devpath : "<idle>");
                                         kill(worker->pid, SIGKILL);

Reply to: