[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Random mysql index corruption on Dell Poweredge 2450



Hello!

I'm posting this to three lists because I can't track down the problem to a
single cause.

We are setting up a Dell Poweredge 2450 to run the central MySQL database
of our crowded web site.  If we raise the load on the database by moving
some services from the current server to the new one, we start
experiencing random database indexes corruption.  Sadly, we've been unable
so far to track down the issue to a single query or some reproducible
sequence of events.  I'll try to include here all the details we have.

The MySQL errors are all 127 "Record-file is crashed" or 134 "Record was
already deleted (or record file crashed)".  Here are some examples:
--------------------------------------------------------------------
Got error 134 from table handler executing query "UPDATE Delayed2 SET [...] WHERE ID=1 AND Tipo=1" (err: 1030)
Got error 127 from table handler executing query "SELECT Simbolo, Prezzo, UNIX_TIMESTAMP(Ora), TotVol, NT  FROM Realtime WHERE ID=23 AND Tipo = 5" (err: 1030)
--------------------------------------------------------------------

This happens both with the Debian MySQL version 3.23.36-6 (from testing)
and with the 3.23.43 precompiled binaries downloaded from the
www.mysql.com web site.  We can fix the tables with myisamchk, but after
some time (ranging from a couple of hours to a couple of days) the
database get corrupted again.

The system is a Debian testing with kernel 2.4.12 (an upgrade to 2.4.13 is
planned).  File system is ext2.

Since I don't know if it's a MySQL fault or a hardware fault, I also
include hardware and driver details:

dmesg log of AIC7XXX and megaraid initialization:
--------------------------------------------------------------------
SCSI subsystem driver Revision: 1.00
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.1
        <Adaptec aic7899 Ultra160 SCSI adapter>
        aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/255 SCBs

scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.1
        <Adaptec aic7899 Ultra160 SCSI adapter>
        aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/255 SCBs

megaraid: v1.17a (Release Date: Fri Jul 13 18:44:01 EDT 2001)
megaraid: found 0x8086:0x1960:idx 0:bus 0:slot 2:func 1
scsi2 : Found a MegaRAID controller at 0xe0808000, IRQ: 20
megaraid: [1.01:1p00] detected 1 logical drives
megaraid: channel[1] is raid.
megaraid: channel[2] is raid.
scsi2 : AMI MegaRAID 1.01 254 commands 16 targs 2 chans 8 luns
scsi2: scanning channel 1 for devices.
  Vendor: DELL      Model: 1x4 U2W SCSI BP   Rev: 1.16
  Type:   Processor                          ANSI SCSI revision: 02
scsi2: scanning channel 2 for devices.
scsi2: scanning virtual channel for logical drives.
  Vendor: MegaRAID  Model: LD0 RAID5 17136R  Rev: 1.01
  Type:   Direct-Access                      ANSI SCSI revision: 02
Attached scsi disk sda at scsi2, channel 2, id 0, lun 0
SCSI device sda: 35094528 512-byte hdwr sectors (17968 MB)
Partition check:
 /dev/scsi/host2/bus2/target0/lun0: p1 p2 p3 < p5 p6 >
--------------------------------------------------------------------

Some /proc stats:
--------------------------------------------------------------------
service:~# cat /proc/megaraid/0/config
Controller Type: 438/466/467/471/493
Base = e0808000, Irq = 20, Logical Drives = 1, Channels = 2
Version =1.01:1p00, DRAM = 128Mb
Controller Queue Depth = 254, Driver Queue Depth = 126
service:~# cat /proc/megaraid/0/stat
Statistical Information for this controller
Interrupts Collected = 2065925
Logical Drive 0:
        Reads Issued = 136738, Writes Issued = 1929155
        Sectors Read = 2611796, Sectors Written = 25004064

service:~# cat /proc/megaraid/0/mailbox
Contents of Mail Box Structure
  Fw Command   = 0x02
  Cmd Sequence = 0x66
  No of Sectors= 0008
  LBA          = 0x16c5c8a
  DTA          = 0x0230f000
  Logical Drive= 0x00
  No of SG Elmt= 0x00
  Busy         = 0
  Status       = 0x00
service:~# cat /proc/megaraid/0/status
TBD
--------------------------------------------------------------------

Sadly, I don't know if I should upgrade some firmware, nor I know what releases
our firmware are, since the machine is hosted in a farm ~100Km from here and
it's hard for me to track boot messages and run boot floppies.  Is there a way
to know that from Linux?

Do you have any hints for me to try to solve this problem?


Bye, Enrico

--
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini (Unibo) <zinie@cs.unibo.it>



Reply to: