[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#468075: this bug/#468075 - sudo: system freeze



reassign 468075 linux-image-2.6.24-1-686
found 468075 2.6.24-4
thanks

Hello kernel maintainers,

Please see this bug log regarding a "soft lockup".  A root processes
looping around time() is sufficient to make the system unusable.

Mar 21 13:06:39 libra kernel: BUG: soft lockup - CPU#0 stuck for 2s! [bash:2363]
Mar 21 13:06:39 libra kernel: 
Mar 21 13:06:39 libra kernel: Pid: 2363, comm: bash Not tainted (2.6.24-1-686 #1)
Mar 21 13:06:39 libra kernel: EIP: 0060:[<c0113c77>] EFLAGS: 00200286 CPU: 0
Mar 21 13:06:39 libra kernel: EIP is at flush_tlb_page+0x3f/0x62
Mar 21 13:06:39 libra kernel: EAX: 080f55e8 EBX: 080f55e8 ECX: cf5663d4 EDX: cf4bf9b0
Mar 21 13:06:39 libra kernel: ESI: cf5cd900 EDI: cf5663d4 EBP: c10780a0 ESP: ce15de6c
Mar 21 13:06:39 libra kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Mar 21 13:06:39 libra kernel: CR0: 8005003b CR2: 080f55e8 CR3: 0f5ce000 CR4: 000006d0
Mar 21 13:06:39 libra kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Mar 21 13:06:39 libra kernel: DR6: ffff4ff0 DR7: 00000400
Mar 21 13:06:39 libra kernel:  [<c01663f3>] do_wp_page+0x3cb/0x4d4
Mar 21 13:06:39 libra kernel:  [<c016730f>] __pte_alloc+0x8d/0x9a
Mar 21 13:06:39 libra kernel:  [<c0165c8b>] vm_normal_page+0xd/0x3e
Mar 21 13:06:39 libra kernel:  [<c016791b>] copy_page_range+0x2e2/0x383
Mar 21 13:06:39 libra kernel:  [<c0167981>] copy_page_range+0x348/0x383
Mar 21 13:06:39 libra kernel:  [<c01682a7>] handle_mm_fault+0x609/0x685
Mar 21 13:06:39 libra kernel:  [<c01099ca>] sched_clock+0x8/0x18
Mar 21 13:06:39 libra kernel:  [<c0103044>] __switch_to+0x9d/0x11d
Mar 21 13:06:39 libra kernel:  [<c011bf45>] do_page_fault+0x1f7/0x592
Mar 21 13:06:39 libra kernel:  [<c0123ec5>] do_fork+0x120/0x1cc
Mar 21 13:06:39 libra kernel:  [<c012f3e6>] sys_rt_sigprocmask+0x4b/0xc7
Mar 21 13:06:39 libra kernel:  [<c011bd4e>] do_page_fault+0x0/0x592
Mar 21 13:06:39 libra kernel:  [<c02bdc32>] error_code+0x72/0x78
Mar 21 13:06:39 libra kernel:  [<c02b0000>] unix_mkname+0x4d/0x6f



Mar 21 13:08:36 libra kernel: BUG: soft lockup - CPU#0 stuck for 2s! [strace:2485]
Mar 21 13:08:36 libra kernel: 
Mar 21 13:08:36 libra kernel: Pid: 2485, comm: strace Not tainted (2.6.24-1-686 #1)
Mar 21 13:08:36 libra kernel: EIP: 0060:[<c012022f>] EFLAGS: 00000282 CPU: 0
Mar 21 13:08:36 libra kernel: EIP is at finish_task_switch+0x25/0x81
Mar 21 13:08:36 libra kernel: EAX: c1207940 EBX: ce37f740 ECX: ce19a030 EDX: ce19a000
Mar 21 13:08:36 libra kernel: ESI: 00000000 EDI: ce19a030 EBP: 00000008 ESP: cd0b5f50
Mar 21 13:08:36 libra kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Mar 21 13:08:36 libra kernel: CR0: 8005003b CR2: b7804000 CR3: 0df70000 CR4: 000006d0
Mar 21 13:08:36 libra kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Mar 21 13:08:36 libra kernel: DR6: ffff4ff0 DR7: 00000400
Mar 21 13:08:36 libra kernel:  [<c02bc7aa>] schedule+0x588/0x5ec
Mar 21 13:08:36 libra kernel:  [<c0248cae>] input_proc_handlers_open+0x8/0xc
Mar 21 13:08:36 libra kernel:  [<c012b977>] sys_ptrace+0x7c/0x83
Mar 21 13:08:36 libra kernel:  [<c0103f76>] work_resched+0x5/0x26


Mar 21 13:08:40 libra kernel: BUG: soft lockup - CPU#0 stuck for 2s! [strace:2485]
Mar 21 13:08:40 libra kernel: 
Mar 21 13:08:40 libra kernel: Pid: 2485, comm: strace Not tainted (2.6.24-1-686 #1)
Mar 21 13:08:40 libra kernel: EIP: 0060:[<c02bda19>] EFLAGS: 00000206 CPU: 0
Mar 21 13:08:40 libra kernel: EIP is at _spin_unlock_irqrestore+0xa/0x13
Mar 21 13:08:40 libra kernel: EAX: 00000206 EBX: 00000000 ECX: 00000206 EDX: 00000200
Mar 21 13:08:40 libra kernel: ESI: ce19a030 EDI: 00000000 EBP: cd0b4000 ESP: cd0b5f7c
Mar 21 13:08:40 libra kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Mar 21 13:08:40 libra kernel: CR0: 8005003b CR2: b77c1000 CR3: 0df70000 CR4: 000006d0
Mar 21 13:08:40 libra kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Mar 21 13:08:40 libra kernel: DR6: ffff4ff0 DR7: 00000400
Mar 21 13:08:40 libra kernel:  [<c01220bf>] wait_task_inactive+0x45/0x62
Mar 21 13:08:40 libra kernel:  [<c012b8f5>] ptrace_check_attach+0xa1/0xa7
Mar 21 13:08:40 libra kernel:  [<c012b944>] sys_ptrace+0x49/0x83
Mar 21 13:08:40 libra kernel:  [<c0103ed6>] syscall_call+0x7/0xb

On Fri, Mar 21, 2008 at 07:14:15AM -0700, Daniel Burrows wrote:
> On Thu, Mar 20, 2008 at 12:24:17PM -0400, Justin Pryzby <justinpryzby@users.sourceforge.net> was heard to say:
> > On Wed, Mar 19, 2008 at 06:31:13PM -0700, Daniel Burrows wrote:
> > >   Most likely some critical part of your X session wanted to access the
> > > disk, and it was blocked out by apt.  I'm not sure how to confirm this;
> > > on Windows there are ways of monitoring file accesses across the system
> > > (so you could see which programs were contending with apt for the disk
> > > and how long they got blocked), but I don't know of any equivalent for
> > > Linux.
> > I don't think it's disk IO related.  Having run this command very
> > often, all my Packages files are cached at various layers, and
> > (despite having a not-recent machine with a slow disk and "only" 256MB
> > RAM) the drive heads seem to be inactive during this interval.
> 
>   That's probably right.
> 
> > I still think there's some problem somewhere between apt and the
> > kernel.  ptracing aptitude might reasonably cause a behavior change
> > due to scheduling or such, but that still leaves a fair amount of
> > observed behavior unexplained.
> 
>   It might be triggered by apt, but it's the kernel's job to manage
> user-space processes so they don't interfere with each other.
> 
>   The attached program test.c, when run as root, will freeze the desktop
> in the same way that "aptitude safe-upgrade" does.  All it does is mmap()
> two files (in fact, the apt package cache), then read from non-sequential
> locations in them.  That last bit is important: if I read sequentially,
> the effect doesn't occur.  If I run the program as a normal user, the
> effect still doesn't occur.
> 
>   This isn't dependent on the use of mmap().  I've attached a test2.c
> that does exactly the same thing as test.c, but using read() system
> calls.  It freezes the desktop too -- but for longer, since it's
> inefficiently reading one byte at a time.
> 
>   Interestingly, if I run these on a text console, other text consoles
> and the programs running in them are unaffected.  Only X suffers a
> complete freeze.
> 
>   Anyway, I'm inclined to reassign this to the kernel.  A user program
> shouldn't trash the system by doing jumpy I/O.


Reply to: