[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RE: Random mysql index corruption on Dell Poweredge 2450



I would recommend you to immediately move to 1.18 megaraid driver.

Do you still see the issue?

Thanks
Atul Mukker
Software Engineer II

LSI Logic Corporation
RAID Storage Adapters Division
6145-D Northbelt Parkway Norcross GA-30071
770-326-9187, 770-246-8765(Fax)
E-mail: atulm@lsil.com
HTTP: www.lsilogic.com


> -----Original Message-----
> From:	joseph.bueno@trader.com [SMTP:joseph.bueno@trader.com]
> Sent:	Monday, October 29, 2001 6:11 AM
> To:	Enrico Zini
> Cc:	linux-megaraid-devel@dell.com; mysql@lists.mysql.com;
> debian-user@lists.debian.org
> Subject:	Re: Random mysql index corruption on Dell Poweredge 2450
> 
> Enrico Zini wrote:
> > 
> > Hello!
> > 
> > I'm posting this to three lists because I can't track down the problem
> to a
> > single cause.
> > 
> > We are setting up a Dell Poweredge 2450 to run the central MySQL
> database
> > of our crowded web site.  If we raise the load on the database by moving
> > some services from the current server to the new one, we start
> > experiencing random database indexes corruption.  Sadly, we've been
> unable
> > so far to track down the issue to a single query or some reproducible
> > sequence of events.  I'll try to include here all the details we have.
> > 
> > The MySQL errors are all 127 "Record-file is crashed" or 134 "Record was
> > already deleted (or record file crashed)".  Here are some examples:
> > --------------------------------------------------------------------
> > Got error 134 from table handler executing query "UPDATE Delayed2 SET
> [...] WHERE ID=1 AND Tipo=1" (err: 1030)
> > Got error 127 from table handler executing query "SELECT Simbolo,
> Prezzo, UNIX_TIMESTAMP(Ora), TotVol, NT  FROM Realtime WHERE ID=23 AND
> Tipo = 5" (err: 1030)
> > --------------------------------------------------------------------
> > 
> > This happens both with the Debian MySQL version 3.23.36-6 (from testing)
> > and with the 3.23.43 precompiled binaries downloaded from the
> > www.mysql.com web site.  We can fix the tables with myisamchk, but after
> > some time (ranging from a couple of hours to a couple of days) the
> > database get corrupted again.
> > 
> > The system is a Debian testing with kernel 2.4.12 (an upgrade to 2.4.13
> is
> > planned).  File system is ext2.
> > 
> > Since I don't know if it's a MySQL fault or a hardware fault, I also
> > include hardware and driver details:
> > 
> > dmesg log of AIC7XXX and megaraid initialization:
> > --------------------------------------------------------------------
> > SCSI subsystem driver Revision: 1.00
> > scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.1
> >         <Adaptec aic7899 Ultra160 SCSI adapter>
> >         aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/255 SCBs
> > 
> > scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.1
> >         <Adaptec aic7899 Ultra160 SCSI adapter>
> >         aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/255 SCBs
> > 
> > megaraid: v1.17a (Release Date: Fri Jul 13 18:44:01 EDT 2001)
> > megaraid: found 0x8086:0x1960:idx 0:bus 0:slot 2:func 1
> > scsi2 : Found a MegaRAID controller at 0xe0808000, IRQ: 20
> > megaraid: [1.01:1p00] detected 1 logical drives
> > megaraid: channel[1] is raid.
> > megaraid: channel[2] is raid.
> > scsi2 : AMI MegaRAID 1.01 254 commands 16 targs 2 chans 8 luns
> > scsi2: scanning channel 1 for devices.
> >   Vendor: DELL      Model: 1x4 U2W SCSI BP   Rev: 1.16
> >   Type:   Processor                          ANSI SCSI revision: 02
> > scsi2: scanning channel 2 for devices.
> > scsi2: scanning virtual channel for logical drives.
> >   Vendor: MegaRAID  Model: LD0 RAID5 17136R  Rev: 1.01
> >   Type:   Direct-Access                      ANSI SCSI revision: 02
> > Attached scsi disk sda at scsi2, channel 2, id 0, lun 0
> > SCSI device sda: 35094528 512-byte hdwr sectors (17968 MB)
> > Partition check:
> >  /dev/scsi/host2/bus2/target0/lun0: p1 p2 p3 < p5 p6 >
> > --------------------------------------------------------------------
> > 
> > Some /proc stats:
> > --------------------------------------------------------------------
> > service:~# cat /proc/megaraid/0/config
> > Controller Type: 438/466/467/471/493
> > Base = e0808000, Irq = 20, Logical Drives = 1, Channels = 2
> > Version =1.01:1p00, DRAM = 128Mb
> > Controller Queue Depth = 254, Driver Queue Depth = 126
> > service:~# cat /proc/megaraid/0/stat
> > Statistical Information for this controller
> > Interrupts Collected = 2065925
> > Logical Drive 0:
> >         Reads Issued = 136738, Writes Issued = 1929155
> >         Sectors Read = 2611796, Sectors Written = 25004064
> > 
> > service:~# cat /proc/megaraid/0/mailbox
> > Contents of Mail Box Structure
> >   Fw Command   = 0x02
> >   Cmd Sequence = 0x66
> >   No of Sectors= 0008
> >   LBA          = 0x16c5c8a
> >   DTA          = 0x0230f000
> >   Logical Drive= 0x00
> >   No of SG Elmt= 0x00
> >   Busy         = 0
> >   Status       = 0x00
> > service:~# cat /proc/megaraid/0/status
> > TBD
> > --------------------------------------------------------------------
> > 
> > Sadly, I don't know if I should upgrade some firmware, nor I know what
> releases
> > our firmware are, since the machine is hosted in a farm ~100Km from here
> and
> > it's hard for me to track boot messages and run boot floppies.  Is there
> a way
> > to know that from Linux?
> > 
> > Do you have any hints for me to try to solve this problem?
> > 
> > Bye, Enrico
> > 
> > --
> > GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini (Unibo)
> <zinie@cs.unibo.it>
> > 
> 
> Hi,
> 
> We have experienced the same kind of problem on our Web database server
> (Dell
> PowerEdge 2450 bi-processor PIII/733) in January. It was running Linux
> 2.2.14
> (RedHat 6.2) and MySQL 3.23.30. At that time, rebooting in single
> processor
> mode "solved" the problem.
> 
> We have installed a new database server (Dell 2550 bi-pro PIII/1000) with
> Linux 2.4.3 (RedHat 7.1 with kernel update) and MySQL 2.23.42 ten days ago
> and it is running without problem since then.
> 
> Your report is scaring me since I was convinced that our index corruption
> problem was due to some weird behavior of Linux 2.2.14 kernel in SMP mode.
> 
> Regards
> --
> Joseph Bueno
> NetClub/Trader.com
> _______________________________________________
> Linux-megaraid-devel mailing list
> Linux-megaraid-devel@dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-megaraid-devel



Reply to: