Re: Random mysql index corruption on Dell Poweredge 2450
Enrico Zini wrote:
>
> Hello!
>
> I'm posting this to three lists because I can't track down the problem to a
> single cause.
>
> We are setting up a Dell Poweredge 2450 to run the central MySQL database
> of our crowded web site. If we raise the load on the database by moving
> some services from the current server to the new one, we start
> experiencing random database indexes corruption. Sadly, we've been unable
> so far to track down the issue to a single query or some reproducible
> sequence of events. I'll try to include here all the details we have.
>
> The MySQL errors are all 127 "Record-file is crashed" or 134 "Record was
> already deleted (or record file crashed)". Here are some examples:
> --------------------------------------------------------------------
> Got error 134 from table handler executing query "UPDATE Delayed2 SET [...] WHERE ID=1 AND Tipo=1" (err: 1030)
> Got error 127 from table handler executing query "SELECT Simbolo, Prezzo, UNIX_TIMESTAMP(Ora), TotVol, NT FROM Realtime WHERE ID=23 AND Tipo = 5" (err: 1030)
> --------------------------------------------------------------------
>
> This happens both with the Debian MySQL version 3.23.36-6 (from testing)
> and with the 3.23.43 precompiled binaries downloaded from the
> www.mysql.com web site. We can fix the tables with myisamchk, but after
> some time (ranging from a couple of hours to a couple of days) the
> database get corrupted again.
>
> The system is a Debian testing with kernel 2.4.12 (an upgrade to 2.4.13 is
> planned). File system is ext2.
>
> Since I don't know if it's a MySQL fault or a hardware fault, I also
> include hardware and driver details:
>
> dmesg log of AIC7XXX and megaraid initialization:
> --------------------------------------------------------------------
> SCSI subsystem driver Revision: 1.00
> scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.1
> <Adaptec aic7899 Ultra160 SCSI adapter>
> aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/255 SCBs
>
> scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.1
> <Adaptec aic7899 Ultra160 SCSI adapter>
> aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/255 SCBs
>
> megaraid: v1.17a (Release Date: Fri Jul 13 18:44:01 EDT 2001)
> megaraid: found 0x8086:0x1960:idx 0:bus 0:slot 2:func 1
> scsi2 : Found a MegaRAID controller at 0xe0808000, IRQ: 20
> megaraid: [1.01:1p00] detected 1 logical drives
> megaraid: channel[1] is raid.
> megaraid: channel[2] is raid.
> scsi2 : AMI MegaRAID 1.01 254 commands 16 targs 2 chans 8 luns
> scsi2: scanning channel 1 for devices.
> Vendor: DELL Model: 1x4 U2W SCSI BP Rev: 1.16
> Type: Processor ANSI SCSI revision: 02
> scsi2: scanning channel 2 for devices.
> scsi2: scanning virtual channel for logical drives.
> Vendor: MegaRAID Model: LD0 RAID5 17136R Rev: 1.01
> Type: Direct-Access ANSI SCSI revision: 02
> Attached scsi disk sda at scsi2, channel 2, id 0, lun 0
> SCSI device sda: 35094528 512-byte hdwr sectors (17968 MB)
> Partition check:
> /dev/scsi/host2/bus2/target0/lun0: p1 p2 p3 < p5 p6 >
> --------------------------------------------------------------------
>
> Some /proc stats:
> --------------------------------------------------------------------
> service:~# cat /proc/megaraid/0/config
> Controller Type: 438/466/467/471/493
> Base = e0808000, Irq = 20, Logical Drives = 1, Channels = 2
> Version =1.01:1p00, DRAM = 128Mb
> Controller Queue Depth = 254, Driver Queue Depth = 126
> service:~# cat /proc/megaraid/0/stat
> Statistical Information for this controller
> Interrupts Collected = 2065925
> Logical Drive 0:
> Reads Issued = 136738, Writes Issued = 1929155
> Sectors Read = 2611796, Sectors Written = 25004064
>
> service:~# cat /proc/megaraid/0/mailbox
> Contents of Mail Box Structure
> Fw Command = 0x02
> Cmd Sequence = 0x66
> No of Sectors= 0008
> LBA = 0x16c5c8a
> DTA = 0x0230f000
> Logical Drive= 0x00
> No of SG Elmt= 0x00
> Busy = 0
> Status = 0x00
> service:~# cat /proc/megaraid/0/status
> TBD
> --------------------------------------------------------------------
>
> Sadly, I don't know if I should upgrade some firmware, nor I know what releases
> our firmware are, since the machine is hosted in a farm ~100Km from here and
> it's hard for me to track boot messages and run boot floppies. Is there a way
> to know that from Linux?
>
> Do you have any hints for me to try to solve this problem?
>
> Bye, Enrico
>
> --
> GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini (Unibo) <zinie@cs.unibo.it>
>
Hi,
We have experienced the same kind of problem on our Web database server (Dell
PowerEdge 2450 bi-processor PIII/733) in January. It was running Linux 2.2.14
(RedHat 6.2) and MySQL 3.23.30. At that time, rebooting in single processor
mode "solved" the problem.
We have installed a new database server (Dell 2550 bi-pro PIII/1000) with
Linux 2.4.3 (RedHat 7.1 with kernel update) and MySQL 2.23.42 ten days ago
and it is running without problem since then.
Your report is scaring me since I was convinced that our index corruption
problem was due to some weird behavior of Linux 2.2.14 kernel in SMP mode.
Regards
--
Joseph Bueno
NetClub/Trader.com
Reply to: