[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

AMD 64 Stability on Asus A8v Deluxe



I  recently set up a new machine as a mail server for our company
(aprox. 1000 Users with 60 GByte Traffic/month). Hardware as follows

Asus A8V Deluxe Mainboard,
nVidia graphic card (NV5M64 [RIVA TNT2 Model 64/Model 64 Pro],
2 Infineon 1 GByte DDR-RAM,  PC 3200
2x 300 GByte Seagate ATA HDs with kernel software raid 1
1x Promise FastTrak TX2000 with 1x Maxtor 4A250J0 250 GByte as Backup HD

I mirrored the software Installation from the existing machine (P4 2,4 GHz, Sarge with
kernel-image-2.4.27-1-386 and kernel-image-2.6.8-2-686), and made new
initrd images, grub and fstab adjustments for the new hardware. I also tried kernel-image-2.6.8-11-amd64-k8 and kernel-image-2.6.11-9-amd64-k8 kernel-image-2.6.11-9 from sid as well as a complete new installation of the unofficial AMD64 Port of sarge.

I am a little disappointed because I get a stable system only with 32bit Sarge and
kernel-image-2.6.8-2-686. Here is how I test:

1) kernel compilation test with the following script:

#!/bin/tcsh
# ramtest
#
  make ARCH=i386 bzImage
   foreach i (0 1 2 3 4 5 6 7 8 9)
     foreach j (0 1 2 3 4 5 6 7 8 9)
      foreach k (0 1 2 3 4 5 6 7 8 9)
( make clean;make ARCH=i386 bzImage > log."$i"$j$k ) >& log.err."$i"$j$k
     end
   end
 end

I must end up with 1000 identical Logfiles

2) a shell script that produces tar archives in an endless loop
#!/bin/sh
# io test harddrive
while true
do
echo "plattentest gestartet, `date`" >> plattentest.log
 i=0
 while [ $i -le 30 ]
 do i=`expr $i + 1`
 tar cf  test$i.tar testdir
done
echo "plattentest fertig, `date`" >> plattentest.log
rm test*.tar
done

Test 1) produces randomly premature compiler stops in all 64 bit kernels and in kernel 2.4.27. In the error logs I often find "Speicherzugriffsfehler" (memory access failure (hopefully translated correctly))

Test 2) is stable with 32 bit Kernels, with 64 Bit kernels it crashes the machine (kernel panic) within 10 to 60 minutes. Usually the crash is so fast that I find nothing in the logs, but in one case I was able to trace the
Kernel Oops (kernel version kernel-image-2.6.11-9-amd64-k8):

Jul 11 19:02:35 ns2 kernel: Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP:
Jul 11 19:02:35 ns2 kernel: <ffffffff80157363>{free_block+131}
Jul 11 19:02:35 ns2 kernel: PGD 6237b067 PUD 65d09067 PMD 0
Jul 11 19:02:35 ns2 kernel: Oops: 0002 [1]
Jul 11 19:02:35 ns2 kernel: CPU 0
Jul 11 19:02:35 ns2 kernel: Modules linked in: capability commoncap evdev uhci_hcd ohci_hcd ehci_hcd i2c_viapro i2c_core shpchp pci_hotplug ide_scsi sata_vi a libata scsi_mod 3c59x mii sk98lin ide_cd cdrom genrtc ext3 jbd mbcache ide_disk ide_generic via82cxxx trm290 triflex slc90e66 sis5513 siimage serverworks sc1200 rz1000 piix pdc202xx_old opti621 ns87415 hpt366 hpt34x generic cy82c693 cs5530 cs5520 cmd64x atiixp amd74xx alim15x3 aec62xx pdc202xx_new ide_core ra
id1 md unix fbcon font bitblit vesafb cfbcopyarea cfbimgblt cfbfillrect
Jul 11 19:02:35 ns2 kernel: Pid: 3973, comm: rm Not tainted 2.6.11-9-amd64-k8 Jul 11 19:02:35 ns2 kernel: RIP: 0010:[<ffffffff80157363>] <ffffffff80157363>{free_block+131}
Jul 11 19:02:35 ns2 kernel: RSP: 0000:ffff810063113bd8  EFLAGS: 00010012
Jul 11 19:02:35 ns2 kernel: RAX: 0000000000000000 RBX: ffff81007ffa3b00 RCX: ffff810061796d48 Jul 11 19:02:35 ns2 kernel: RDX: 0000000000000000 RSI: ffff810041796080 RDI: 0000000000000218 Jul 11 19:02:35 ns2 kernel: RBP: 000000000000000b R08: 0000000000000002 R09: ffff810063113c76 Jul 11 19:02:35 ns2 kernel: R10: ffff810063113c90 R11: ffffffff880db220 R12: ffff81007ffa3b10 Jul 11 19:02:35 ns2 kernel: R13: 000000000000001b R14: ffff81007fb78c10 R15: ffff81007ffa3b30 Jul 11 19:02:35 ns2 kernel: FS: 0000000000000000(0000) GS:ffffffff803f5a80(005b) knlGS:00000000556a92a0 Jul 11 19:02:35 ns2 kernel: CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b Jul 11 19:02:35 ns2 kernel: CR2: 0000000000000008 CR3: 0000000062b39000 CR4: 00000000000006e0 Jul 11 19:02:35 ns2 kernel: Process rm (pid: 3973, threadinfo ffff810063112000, task ffff810065028130) Jul 11 19:02:35 ns2 kernel: Stack: ffff81007e5ccd98 000000000000001b ffff81007ffae9a0 ffff81007fb78c10 Jul 11 19:02:35 ns2 kernel: ffff81007fb78c00 000000000000001b ffff810063113e08 ffffffff80157551
Jul 11 19:02:35 ns2 kernel:        ffff81007fb78c00 ffff8100649bf290
Jul 11 19:02:35 ns2 kernel: Call Trace:<ffffffff80157551>{cache_flusharray+113} <ffffffff801571bc>{kmem_cache_free+44} Jul 11 19:02:35 ns2 kernel: <ffffffff801cd44b>{radix_tree_delete+347} <ffffffff801449f8>{wake_up_bit+24} Jul 11 19:02:35 ns2 kernel: <ffffffff880c8525>{:jbd:do_get_write_access+1381} <ffffffff801449f8>{wake_up_bit+24} Jul 11 19:02:35 ns2 kernel: <ffffffff880c8525>{:jbd:do_get_write_access+1381} <ffffffff801704bf>{free_buffer_head+47} Jul 11 19:02:35 ns2 kernel: <ffffffff80170542>{try_to_free_buffers+114} <ffffffff880c8fd4>{:jbd:journal_invalidatepage+628} Jul 11 19:02:35 ns2 kernel: <ffffffff8014f583>{__remove_from_page_cache+35} <ffffffff8014f5e0>{remove_from_page_cache+48} Jul 11 19:02:35 ns2 kernel: <ffffffff80158ed8>{truncate_complete_page+56} <ffffffff8015902f>{truncate_inode_pages+143} Jul 11 19:02:35 ns2 kernel: <ffffffff880e349d>{:ext3:__ext3_journal_stop+45} <ffffffff880e1a8c>{:ext3:ext3_unlink+476} Jul 11 19:02:35 ns2 kernel: <ffffffff80186750>{generic_delete_inode+112} <ffffffff8017cd54>{sys_unlink+260}
Jul 11 19:02:35 ns2 kernel:        <ffffffff8011f4a1>{ia32_sysret+0}
Jul 11 19:02:35 ns2 kernel:
Jul 11 19:02:35 ns2 kernel: Code: 48 89 50 08 48 89 02 48 2b 4e 18 48 c7 06 00 01 10 00 48 c7 Jul 11 19:02:35 ns2 kernel: RIP <ffffffff80157363>{free_block+131} RSP <ffff810063113bd8>
Jul 11 19:02:35 ns2 kernel: CR2: 0000000000000008


I changed RAM (4x 512 MByte) and power supply (450 instead of 300 W) but this had no influence.

I assume it cannot be the general stability of the 64 bit kernel and maybe I have a defect motherboard but in that case it feels strange, that kernel 2.6.8-2-686 runs my tests absolut stable whithout any errors (testing time 24 hours).

I am glad that after a week of testing I finally found a stable configuration but you have to admit that it is kind of frustrating, that the system consequently refuses to run stable in 64 bit mode.

Is this an isolated case with an unlucky hardware mixture or can somebody report similar failures?

Any comments or suggestions would be gladly appreciated.

Matthias Wenthe



Reply to: