[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

OT: AMD chips cause kernel errors and hangs?



We have a Linux cluster of 1000 nodes. I wasn't involved in setting it up.
They use RedHat 6.2 kernel 2.2.19. Dual AMD 1.2GHz, 2GB memory, 2GB swap,
GB ethernet.

Several nodes hang and/or get kernel errors every day. The first causes
that come to mind are bad RAM and running out of virtual memory. I've
pasted some logs below. 

The slaves mostly run FORTRAN code compiled with Lahey F95 v6.0 and g77
(0.5.24-19981002).

What else could cause these errors? Are there special kernel config issues
for AMD chips? 

I've run Linux for 9 years, always used Intel CPUs, used Debian since
before the first official release ("buzz"), but never heard of so many
problems.

ch_binary_handler+67/168] [do_execve+417/516] [sys_execve+75/124]
[system_call+52/56]  
Aug 21 06:35:07 hou000752cs kernel: Code: f6 46 24 01 74 52 8b 4c 24 68 39
4e 14 75 49 8b 4c 24 64 31  
Aug 21 06:35:07 hou000752cs inetd[458]: pid 11124: exit signal 11
Aug 21 06:35:07 hou000752cs kernel: Unable to handle kernel paging request
at virtual address 00ff0024 
Aug 21 06:35:07 hou000752cs kernel: current->tss.cr3 = 1463e000, %cr3 =
1463e000 
Aug 21 06:35:07 hou000752cs kernel: *pde = 00000000 
Aug 21 06:35:07 hou000752cs kernel: Oops: 0000 
Aug 21 06:35:07 hou000752cs kernel: CPU:    0 
Aug 21 06:35:07 hou000752cs kernel: EIP:
0010:[locks_remove_posix+44/152] 
Aug 21 06:35:07 hou000752cs kernel: EFLAGS: 00010206 
Aug 21 06:35:07 hou000752cs kernel: eax: 94629b04   ebx: be6b35a0   ecx:
94629a94   edx: 947f6920 
Aug 21 06:35:07 hou000752cs kernel: esi: 00ff0000   edi: 942157c0   ebp:
94629b04   esp: 93a9bc28 
Aug 21 06:35:07 hou000752cs kernel: ds: 0018   es: 0018   ss: 0018 
Aug 21 06:35:07 hou000752cs kernel: Process in.ftpd (pid: 11125, process
nr: 30, stackpage=93a9b000) 
Aug 21 06:35:07 hou000752cs kernel: Stack: 942157c0 bcc13f60 94629b04
94629a94 8012699a 94785f00 93a9a000 94785f00  
Aug 21 06:35:07 hou000752cs kernel:        fffffff7 00000202 93f45aa0
00013000 93f45a40 2aabf000 93f45adc 80135619  
Aug 21 06:35:07 hou000752cs kernel:        80135626 93f45a40 08085fc0
0806b800 00000000 bcc13f60 80126991 be6b35a0  
Aug 21 06:35:07 hou000752cs kernel: Call Trace: [filp_close+82/92]
[load_elf_interp+677/708] [load_elf_interp+690/708] [filp

------------------------------------------------------------------------

Aug 21 04:02:00 hou000721cs anacron[5515]: Updated timestamp for job
`cron.daily' to 2001-08-21
Aug 21 04:02:01 hou000721cs kernel: Unable to handle kernel paging request
at virtual address 11008010 
Aug 21 04:02:01 hou000721cs kernel: current->tss.cr3 = 145aa000, %cr3 =
145aa000 
Aug 21 04:02:01 hou000721cs kernel: *pde = 00000000 
Aug 21 04:02:01 hou000721cs kernel: Oops: 0000 
Aug 21 04:02:01 hou000721cs kernel: CPU:    0 
Aug 21 04:02:01 hou000721cs kernel: EIP:    0010:[d_lookup+100/224] 
Aug 21 04:02:01 hou000721cs kernel: EFLAGS: 00010217 
Aug 21 04:02:01 hou000721cs kernel: eax: beee9a88   ebx: 11007ff8   ecx:
00000022   edx: bee00000 
Aug 21 04:02:01 hou000721cs kernel: esi: 322f6ef6   edi: ac72f00a   ebp:
11008010   esp: 8542bf3c 
Aug 21 04:02:01 hou000721cs kernel: ds: 0018   es: 0018   ss: 0018 
Aug 21 04:02:01 hou000721cs kernel: Process slocate (pid: 5612, process
nr: 18, stackpage=8542b000) 
Aug 21 04:02:01 hou000721cs kernel: Stack: ac72f00a 00000000 beee9a88
ac72f000 322f6ef6 0000000a 8012df0c aa7363e0  
Aug 21 04:02:01 hou000721cs kernel:        8542bf84 8542bf84 8012e187
aa7363e0 8542bf84 00000000 ac72f000 ac72f000  
Aug 21 04:02:01 hou000721cs kernel:        8542a000 7ffffc38 ac72f000
0000000a 322f6ef6 8012e284 ac72f000 aa7363e0  
Aug 21 04:02:01 hou000721cs kernel: Call Trace: [cached_lookup+16/84]
[lookup_dentry+275/488] [__namei+40/88] [sys_newlstat+42/140] 
[system_call+52/56]  
Aug 21 04:02:01 hou000721cs kernel: Code: 8b 6d 00 8b 74 24 18 39 73 48 75
5c 8b 74 24 24 39 73 0c 75  

------------------------------------------------------------------------

Aug 19 12:10:00 hou000669cs kernel: Unable to handle kernel paging request
at virtual address d2040200 
Aug 19 12:10:00 hou000669cs kernel: current->tss.cr3 = 11c09000, %cr3 =
11c09000 
Aug 19 12:10:00 hou000669cs kernel: *pde = 00000000 
Aug 19 12:10:00 hou000669cs kernel: Oops: 0000 
Aug 19 12:10:00 hou000669cs kernel: CPU:    0 
Aug 19 12:10:00 hou000669cs kernel: EIP:    0010:[flush_old_exec+196/552] 
Aug 19 12:10:00 hou000669cs kernel: EFLAGS: 00010246 
Aug 19 12:10:00 hou000669cs kernel: eax: 00000000   ebx: 9b040000   ecx:
9b041e5c   edx: 11c09000 
Aug 19 12:10:00 hou000669cs kernel: esi: 00000000   edi: 801e59c3   ebp:
9a5c4000   esp: 9b041ca0 
Aug 19 12:10:00 hou000669cs kernel: ds: 0018   es: 0018   ss: 0018 
Aug 19 12:10:00 hou000669cs kernel: Process crond (pid: 15182, process nr:
24, stackpage=9b041000) 
Aug 19 12:10:00 hou000669cs kernel: Stack: 801e59c3 befddf80 00000000
9b040000 80135d52 9b041e5c 8021e718 fffffff
8  
Aug 19 12:10:00 hou000669cs kernel:        9b040000 00000000 00000000
00000000 00030003 00000001 00001990 0000003
4  
Aug 19 12:10:00 hou000669cs kernel:        464c457f 00010101 00000000
00000080 9b041d6c befcf400 9b041da4 805427b
0  
Aug 19 12:10:00 hou000669cs kernel: Call Trace: [cprt+1315/42661]
[load_elf_binary+1546/3480] [update_atime+94/10
0] [do_generic_file_read+1524/1536] [cprt+1312/42661]
[search_binary_handler+67/168] [do_execve+417/516]  
Aug 19 12:10:00 hou000669cs kernel:        [sys_execve+75/124]
[system_call+52/56]  
Aug 19 12:10:00 hou000669cs kernel: Code: 66 39 83 00 02 00 00 75 29 8b 7c
24 14 66 8b 87 06 02 00 00  

--------------------------------------------------------------------------

Aug 21 04:02:00 hou000587cs kernel: Unable to handle kernel NULL pointer
derefer
ence at virtual address 00000040 
Aug 21 04:02:00 hou000587cs kernel: current->tss.cr3 = 20c50000, %cr3 =
20c50000
 
Aug 21 04:02:00 hou000587cs kernel: *pde = 00000000 
Aug 21 04:02:00 hou000587cs kernel: Oops: 0000 
Aug 21 04:02:00 hou000587cs kernel: CPU:    0 
Aug 21 04:02:00 hou000587cs kernel: EIP:    0010:[dput+295/328] 
Aug 21 04:02:00 hou000587cs kernel: EFLAGS: 00010286 
Aug 21 04:02:00 hou000587cs kernel: eax: 00000000   ebx: 8aa1d680   ecx:
a14faf8
0   edx: a14fad7c 
Aug 21 04:02:00 hou000587cs kernel: esi: ffffffff   edi: 00001004   ebp:
0000000
1   esp: 9cf7be64 
Aug 21 04:02:00 hou000587cs kernel: ds: 0018   es: 0018   ss: 0018 
Aug 21 04:02:00 hou000587cs kernel: Process slocate (pid: 32162, process
nr: 30,
 stackpage=9cf7b000) 
Aug 21 04:02:00 hou000587cs kernel: Stack: 8aa1d680 80132c0c 8aa1d680
9cf7beb0 9
cf7beb0 8021e644 00001004 00001004  
Aug 21 04:02:00 hou000587cs kernel:        80133d68 fffff7f6 00000806
00000000 8024a198 80
21e644 8024a198 a53672a0  
Aug 21 04:02:00 hou000587cs kernel:        a53672a0 00000000 98bea3fc
9cf7beb0 9cf7beb0 80
133df6 00001004 00000000  
Aug 21 04:02:00 hou000587cs kernel: Call Trace: [prune_dcache+288/340]
[try_to_free_inodes
+316/396] [grow_inodes+30/440] [get_new_inode+197/312] [iget4+134/144]
[iget+19/24] [ext2_
lookup+84/124]  
Aug 21 04:02:00 hou000587cs kernel:        [real_lookup+80/160]
[lookup_dentry+296/488] [_
_namei+40/88] [sys_newlstat+42/140] [system_call+52/56]  
Aug 21 04:02:00 hou000587cs kernel: Code: 8b 40 40 50 56 68 e0 55 1e 80 e8
7a 20 fe ff c7 
05 00 00 00  

...RickM...



Reply to: