[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#532750: Exchanging hardware didn't help (was: was very likely caused by hardware failure)



reopen 532750
kthxbye

Hi,

On Mon, Jun 15, 2009 at 02:25:42PM +0200, Axel Beckert wrote:
> the reported bug seems to have been resolved by the manufacturer by
> replacing the mainboard. So although the kernel reported a bug, it was
> primarily a hardware issue.

I was to fast with this statement. Because although even the
manufacturer expected a hardware issue and replaced the motherboard on
warranty, the issue appeared again with the new motherboard.

We're currently tracking it down to what causes the oopses and
freezes. It currently seems to be a NFS related issue. If we stop our
NFS mount tests (it's a monitoring server), the machine seems to be
stable. If we reenable it, it only takes hours to the next crash.

Also interesting is that after the Oopses happened, the system was
still usable for another 1.5 minutes or so and only then froze. So
after the following Oops today, I even could log in via ssh, but when
before I managed to type "uptime", the machine was frozen.

Jun 18 12:45:43 omniculars kernel: [275997.686291] Bad page state in process 'smb' 
Jun 18 12:45:43 omniculars kernel: [275997.686293] page:ffffe200000a3a00 flags:0x0100000000000000 mapping:0000000000000000 mapcount:0 count:-1 
Jun 18 12:45:43 omniculars kernel: [275997.715666] Trying to fix it up, but a reboot is needed 
Jun 18 12:45:43 omniculars kernel: [275997.715668] Backtrace: 
Jun 18 12:45:43 omniculars kernel: [275997.741751] Pid: 28000, comm: smb Not tainted 2.6.26-2-amd64 #1 
Jun 18 12:45:43 omniculars kernel: [275997.753733]  
Jun 18 12:45:43 omniculars kernel: [275997.753734] Call Trace: 
Jun 18 12:45:43 omniculars kernel: [275997.761949]  [<ffffffff80274ca4>] __rmqueue_smallest+0x88/0xfb 
Jun 18 12:45:43 omniculars kernel: [275997.773787]  [<ffffffff80274fb0>] bad_page+0x6b/0x95 
Jun 18 12:45:43 omniculars kernel: [275997.783875]  [<ffffffff802763eb>] get_page_from_freelist+0x3e0/0x607 
Jun 18 12:45:43 omniculars kernel: [275997.800905]  [<ffffffff80276894>] __alloc_pages_internal+0xd6/0x3bf 
Jun 18 12:45:43 omniculars kernel: [275997.812282]  [<ffffffff80275f7d>] __get_free_pages+0xe/0x4d 
Jun 18 12:45:43 omniculars kernel: [275997.823574]  [<ffffffff80232b7c>] copy_process+0xc1/0x1160 
Jun 18 12:45:43 omniculars kernel: [275997.832281]  [<ffffffff80233d73>] do_fork+0xd4/0x236 
Jun 18 12:45:43 omniculars kernel: [275997.843408]  [<ffffffff802a0b92>] do_pipe+0x94/0xd9 
Jun 18 12:45:43 omniculars kernel: [275997.853324]  [<ffffffff8023decf>] recalc_sigpending+0xe/0x38 
Jun 18 12:45:43 omniculars kernel: [275997.864791]  [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f 
Jun 18 12:45:43 omniculars kernel: [275997.878834]  [<ffffffff8020c257>] ptregscall_common+0x67/0xb0 
Jun 18 12:45:43 omniculars kernel: [275997.890816]  

We will add more information to this bugreport as soon as we
found out more.

		Kind regards, Axel Beckert
-- 
Axel Beckert <beckert@phys.ethz.ch>       support: +41 44 633 26 68
IT Services Group, HPT D 17                 voice: +41 44 633 41 89
Departement of Physics, ETH Zurich
CH-8093 Zurich, Switzerland		   http://nic.phys.ethz.ch/



Reply to: