[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: vaughan.debian.org



On Nov 19, 2007 7:40 PM, Noah Meyerhans <noahm@debian.org> wrote:
> Those of you who saw my recent blog post [1] are, no doubt, waiting with
> baited breath for the return of our mipsel porting machine.
> Unfortunately, problems persist even after addressing the cooling
> problems that I initially believed were affecting the machine's
> stability.
>
> Vaughan will run for some time, but will eventually start misbehaving.
> It stays up longer if it's no under any load, but still does eventually
> go down.  Here are some of the kernel dumps that it shows.  These code
> dumps are from Linux 2.6.23.1, but similar problems occur in other
> kernels.
>
> Kernel bug detected[#2]:
> Cpu 0
> $ 0   : 00000000 b0007c01 00000001 00003fff
> $ 4   : 810caa60 7fe9bf0a 80310000 000caa60
> $ 8   : 00006553 7fe9bf0a 800f1098 00000000
> $12   : 00000000 00000000 85811da0 746f6f72
> $16   : 810caa60 8347f56c 0000000e 7fe9bf0a
> $20   : 811c11b8 803321e0 856d7e2c 856d7e28
> $24   : 99999999 2ac30710
> $28   : 856d6000 856d7da8 00000001 80089e2c
> Hi    : 00000000
> Lo    : 00000000
> epc   : 8008ad9c kmap_coherent+0xc/0xe0     Tainted: G      D
> ra    : 80089e2c __flush_anon_page+0x4c/0x68
> Status: b0007c03    KERNEL EXL IE
> Cause : 00808034
> PrId  : 000028a0
> Process w (pid: 28428, threadinfo=856d6000, task=8116e928)
> Stack : 803321e0 8347f56c 0000000e 7fe9bf0a 800db0d0 800dad84 00000001 856d7ea0
>         800f18d0 00000000 00000011 00000000 00000030 00000000 803321e0 7fe9bf0a
>         866c8000 0000000f 000007ff 803321e0 00000000 856d7e28 856d7e2c 800db2b8
>         811c11b8 8116e928 000000d0 00000000 00000000 00000001 856d7e2c 856d7e28
>         00000000 810caa60 80332214 00000000 803321e0 00000000 0000000f 866c8000
>         ...
> Call Trace:
> [<8008ad9c>] kmap_coherent+0xc/0xe0
> [<80089e2c>] __flush_anon_page+0x4c/0x68
> [<800db0d0>] get_user_pages+0x3c4/0x4ac
> [<800db2b8>] access_process_vm+0x100/0x21c
> [<8012d91c>] proc_pid_cmdline+0xa4/0x14c
> [<8012f858>] proc_info_read+0x100/0x140
> [<800f0b4c>] vfs_read+0xc0/0x160
> [<800f10ec>] sys_read+0x54/0xa0
> [<80088d0c>] stack_done+0x20/0x3c
>
>
> Code: 8c820000  00021242  30420001 <00028036> 8f820014  3c038035  24420001  af820014  8c629240
>
> This is the first sign of trouble.  The symptoms observable from
> userland are that just about any program that you try to run dies with a
> segfault.  The machine never recovers from this state, and eventually
> gets worse:
>
> CPU 0 Unable to handle kernel paging request at virtual address
> 000000d0, epc == 800ebb34, ra == 800eb68c
> Oops[#4]:
> Cpu 0
> $ 0   : 00000000 90007c00 8035dc08 000000d0
> $ 4   : 8111fa80 83fdb990 0000002a 83fdb000
> $ 8   : 8035dc00 00000000 00000001 00024000
> $12   : 00000001 00080000 fff7ffff 00200200
> $16   : 8035e694 00000021 8111fa80 00000000
> $20   : 00024000 80350000 00200200 00100100
> $24   : 00100100 00000000
> $28   : 80378000 80379cd8 0000003c 800eb68c
> Hi    : 00000036
> Lo    : 000000d8
> epc   : 800ebb34 free_block+0xec/0x1b0     Tainted: G      D
> ra    : 800eb68c cache_flusharray+0x74/0xfc
> Status: 90007c02    KERNEL EXL
> Cause : 0080800c
> BadVA : 000000d0
> PrId  : 000028a0
> Process kswapd0 (pid: 72, threadinfo=80378000, task=8116fa08)
> Stack : 00808400 800cf650 90007c01 800b4334 0000003c 90007c01 00000000 8035e600
>         8035e610 80379da8 00000001 00000000 0000000d 800eb68c 819aae70 0000002a
>         87ead070 0000003a 8035e600 90007c01 8695e8c0 80379f48 00000001 800eb938
>         80355ca0 810d5a40 80379e74 80379f48 8695e8c0 00000001 80379e74 80116a58
>         80379e74 80379f48 00000001 80379da8 80116f30 80116f10 800d4c78 8101e2a0
>         ...
> Call Trace:
> [<800ebb34>] free_block+0xec/0x1b0
> [<800eb68c>] cache_flusharray+0x74/0xfc
> [<800eb938>] kmem_cache_free+0x110/0x118
> [<80116a58>] free_buffer_head+0x2c/0x48
> [<80116f30>] try_to_free_buffers+0x6c/0xcc
> [<800d5330>] shrink_page_list+0x640/0x7fc
> [<800d573c>] shrink_zone+0x250/0xbfc
> [<800d6700>] kswapd+0x2ac/0x434
> [<800b8658>] kthread+0x58/0x94
> [<800835a4>] kernel_thread_helper+0x10/0x18
>
> Code: 8ce30004  8ce20000  8c88004c <ac620000> ac430004  acf70000 acf60004  8ce2000c  8e440014
>
> The call trace in this latter case isn't always the same, but free_block
> does always seem to be at the top of the stack.
>
> It's quite possible that this is a hardware problem.  Do others concur?
> Is there any chance that it is software?  If it is hardware, my
> inclination would be to suspect RAM.  Does anybody have a decent source
> for Cobalt Raq2 memory?
>
> noah
>
> 1. http://nlm-morgul.livejournal.com/12188.html
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
>
> iD8DBQFHQdiJYrVLjBFATsMRAop/AKCJR4bMRMZPhXXYIc0lbvz/tifN3ACbB6pe
> KCxfVPx865cm/bVKTSowmVQ=
> =xHV6
> -----END PGP SIGNATURE-----
>
>

Hi Noah,

I've bought last year 256meg of RAM for my Qube2 at the following URL:
http://www.satech.com/128mb-cobalt-qube-2.html

I was satisfied of their service (one of the stick was DOA, they send
me a replacement at their expense - cross atlantic).

Anyway, in the meanwhile if you need access to my Qube2 let me know...

Regards,
Seb.



Reply to: