[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Random mysql index corruption on Dell Poweredge 2450



Enrico Zini wrote:
> 
> Hello!
> 
> I'm posting this to three lists because I can't track down the problem to a
> single cause.
> 
> We are setting up a Dell Poweredge 2450 to run the central MySQL database
> of our crowded web site.  If we raise the load on the database by moving
> some services from the current server to the new one, we start
> experiencing random database indexes corruption.  Sadly, we've been unable
> so far to track down the issue to a single query or some reproducible
> sequence of events.  I'll try to include here all the details we have.
> 
> The MySQL errors are all 127 "Record-file is crashed" or 134 "Record was
> already deleted (or record file crashed)".  Here are some examples:
> --------------------------------------------------------------------
> Got error 134 from table handler executing query "UPDATE Delayed2 SET [...] WHERE ID=1 AND Tipo=1" (err: 1030)
> Got error 127 from table handler executing query "SELECT Simbolo, Prezzo, UNIX_TIMESTAMP(Ora), TotVol, NT  FROM Realtime WHERE ID=23 AND Tipo = 5" (err: 1030)
> --------------------------------------------------------------------
> 
> This happens both with the Debian MySQL version 3.23.36-6 (from testing)
> and with the 3.23.43 precompiled binaries downloaded from the
> www.mysql.com web site.  We can fix the tables with myisamchk, but after
> some time (ranging from a couple of hours to a couple of days) the
> database get corrupted again.
> 
> The system is a Debian testing with kernel 2.4.12 (an upgrade to 2.4.13 is
> planned).  File system is ext2.
> 
> Since I don't know if it's a MySQL fault or a hardware fault, I also
> include hardware and driver details:
> 
> dmesg log of AIC7XXX and megaraid initialization:
> --------------------------------------------------------------------
> SCSI subsystem driver Revision: 1.00
> scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.1
>         <Adaptec aic7899 Ultra160 SCSI adapter>
>         aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/255 SCBs
> 
> scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.1
>         <Adaptec aic7899 Ultra160 SCSI adapter>
>         aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/255 SCBs
> 
> megaraid: v1.17a (Release Date: Fri Jul 13 18:44:01 EDT 2001)
> megaraid: found 0x8086:0x1960:idx 0:bus 0:slot 2:func 1
> scsi2 : Found a MegaRAID controller at 0xe0808000, IRQ: 20
> megaraid: [1.01:1p00] detected 1 logical drives
> megaraid: channel[1] is raid.
> megaraid: channel[2] is raid.
> scsi2 : AMI MegaRAID 1.01 254 commands 16 targs 2 chans 8 luns
> scsi2: scanning channel 1 for devices.
>   Vendor: DELL      Model: 1x4 U2W SCSI BP   Rev: 1.16
>   Type:   Processor                          ANSI SCSI revision: 02
> scsi2: scanning channel 2 for devices.
> scsi2: scanning virtual channel for logical drives.
>   Vendor: MegaRAID  Model: LD0 RAID5 17136R  Rev: 1.01
>   Type:   Direct-Access                      ANSI SCSI revision: 02
> Attached scsi disk sda at scsi2, channel 2, id 0, lun 0
> SCSI device sda: 35094528 512-byte hdwr sectors (17968 MB)
> Partition check:
>  /dev/scsi/host2/bus2/target0/lun0: p1 p2 p3 < p5 p6 >
> --------------------------------------------------------------------
> 
> Some /proc stats:
> --------------------------------------------------------------------
> service:~# cat /proc/megaraid/0/config
> Controller Type: 438/466/467/471/493
> Base = e0808000, Irq = 20, Logical Drives = 1, Channels = 2
> Version =1.01:1p00, DRAM = 128Mb
> Controller Queue Depth = 254, Driver Queue Depth = 126
> service:~# cat /proc/megaraid/0/stat
> Statistical Information for this controller
> Interrupts Collected = 2065925
> Logical Drive 0:
>         Reads Issued = 136738, Writes Issued = 1929155
>         Sectors Read = 2611796, Sectors Written = 25004064
> 
> service:~# cat /proc/megaraid/0/mailbox
> Contents of Mail Box Structure
>   Fw Command   = 0x02
>   Cmd Sequence = 0x66
>   No of Sectors= 0008
>   LBA          = 0x16c5c8a
>   DTA          = 0x0230f000
>   Logical Drive= 0x00
>   No of SG Elmt= 0x00
>   Busy         = 0
>   Status       = 0x00
> service:~# cat /proc/megaraid/0/status
> TBD
> --------------------------------------------------------------------
> 
> Sadly, I don't know if I should upgrade some firmware, nor I know what releases
> our firmware are, since the machine is hosted in a farm ~100Km from here and
> it's hard for me to track boot messages and run boot floppies.  Is there a way
> to know that from Linux?
> 
> Do you have any hints for me to try to solve this problem?
> 
> Bye, Enrico
> 
> --
> GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini (Unibo) <zinie@cs.unibo.it>
> 

Hi,

We have experienced the same kind of problem on our Web database server (Dell
PowerEdge 2450 bi-processor PIII/733) in January. It was running Linux 2.2.14
(RedHat 6.2) and MySQL 3.23.30. At that time, rebooting in single processor
mode "solved" the problem.

We have installed a new database server (Dell 2550 bi-pro PIII/1000) with
Linux 2.4.3 (RedHat 7.1 with kernel update) and MySQL 2.23.42 ten days ago
and it is running without problem since then.

Your report is scaring me since I was convinced that our index corruption
problem was due to some weird behavior of Linux 2.2.14 kernel in SMP mode.

Regards
--
Joseph Bueno
NetClub/Trader.com



Reply to: