[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#556030: Root cause: deadlock in a driver



We extensively researched the problem.

 

The TLB flush softlockup is only a CONSEQUENCE of a deadlock.

 

Background: The TLB flush is issued by a CPU to  a number of other CPUs using inter-processor interupts to progagate paging changes. Then the issuing CPU loops until all processor acknowledge the change. If such processor is in deadlock on a spinlock, this never hapens, then the softlockup triggers. The deadlock arise on a spinlock, this lock may be held by user code sometimes (through /proc or /sys interfaces of modules).

 

The only way to identify the root cause (i.e. which driver is causing problems) is to dump ALL CPU stacks in the soft lockup code.

 

One way to do that is to modifiy the kernel and add

                arch_trigger_all_cpu_backtrace()

in the

                kernel/softlockup.c:softlockup_tick()

function.

 

This is based on NMI IPI which ensure all stacks are dump, even in the case of deadlock (well don't expect the impossible to happen either).

 

You should easily find the faulty driver and post the relevant bug.

 

Hope this helps

 

François-Frédéric


Reply to: