[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Vanishing RAID autodetect partition



I'm currently running Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-18etch3) (dannf@debian.org) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Thu Apr 24 03:57:46 UTC 2008

Up until now, things have been working fine with the two software raid5 arrays I've got running via mdadm. I had just replaced a failed disk in the second array (/dev/md1) and decided to update the system with "apt-get update" and "apt-get upgrade", which all proceeded normally (including updating to the latest kernel image as seen above). After the update, I restarted the box to finish the process, and upon getting back into the system I noticed the first array was running in degraded mode with a disk missing. Upon inspecting /proc/partitions i found that /dev/sdm didn't have any partitions listed at all:
---
major minor  #blocks  name

  8     0  312571224 sda
  8     1    6835626 sda1
  8     2          1 sda2
  8     5    6040408 sda5
  8     6  299692543 sda6
  8    16  488386584 sdb
  8    17  488287611 sdb1
  8    32  488386584 sdc
  8    33  488287611 sdc1
  8    48  488386584 sdd
  8    49  488287611 sdd1
  8    64  488386584 sde
  8    65  488287611 sde1
  8    80  488386584 sdf
  8    81  488287611 sdf1
  8    96  488386584 sdg
  8    97  488287611 sdg1
  8   112  488386584 sdh
  8   113  488287611 sdh1
  8   128  488386584 sdi
  8   129  488287611 sdi1
  8   144  488386584 sdj
  8   145  488287611 sdj1
  8   160  244198584 sdk
  8   161  244147806 sdk1
  8   176  244198584 sdl
  8   177  244147806 sdl1
  8   192  488386584 sdm
  8   208  244198584 sdn
  8   209  244147806 sdn1
  9     0 4394587392 md0
  9     1  488295424 md1
253     0 4882878464 dm-0
---

Here's the layout of the RAID arrays at that point:
---
Personalities : [raid6] [raid5] [raid4]
md1 : active raid5 sdk1[0] sdn1[2] sdl1[1]
     488295424 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
md0 : active raid5 sdb1[0] sdg1[9] sdh1[8] sdi1[7] sdj1[6] sdf1[4] sde1[3] sdd1[2] sdc1[1]
     4394587392 blocks level 5, 64k chunk, algorithm 2 [10/9] [UUUUU_UUUU]
unused devices: <none>
---

So I figured I'd check the drive in fdisk, which actually found the partition to exist:
---
Disk /dev/sdm: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

  Device Boot      Start         End      Blocks   Id  System
/dev/sdm1 1 60789 488287611 fd Linux raid autodetect
---

I tried switching out the SATA cable with a new one, which had no effect.
Moving the drive to a different port on the controller card didn't affect it either. A self-test via smartctl (from smartmontools) didn't turn anything up, so I just decided to go back into fdisk and write the partition table to disk. I didn't make any changes to the table. I just went in, printed the list to make sure it was there, then wrote to disk. After doing this, the partition appeared in /proc/partitions as seen here:
---
major minor  #blocks  name

  8     0  312571224 sda
  8     1    6835626 sda1
  8     2          1 sda2
  8     5    6040408 sda5
  8     6  299692543 sda6
  8    16  488386584 sdb
  8    17  488287611 sdb1
  8    32  488386584 sdc
  8    33  488287611 sdc1
  8    48  488386584 sdd
  8    49  488287611 sdd1
  8    64  488386584 sde
  8    65  488287611 sde1
  8    80  488386584 sdf
  8    81  488287611 sdf1
  8    96  488386584 sdg
  8    97  488287611 sdg1
  8   112  488386584 sdh
  8   113  488287611 sdh1
  8   128  488386584 sdi
  8   129  488287611 sdi1
  8   144  488386584 sdj
  8   145  488287611 sdj1
  8   160  244198584 sdk
  8   161  244147806 sdk1
  8   176  244198584 sdl
  8   177  244147806 sdl1
  8   192  488386584 sdm
  8   193  488287611 sdm1
  8   208  244198584 sdn
  8   209  244147806 sdn1
  9     1  488295424 md1
---

So with the partition back in working order I attempted to start the array with sdm1 included, which kicked it out with a non-fresh error code. Rather than taking the risk of corrupted data, I just re-added the drive to the array and let it rebuild. Everything appeared to be working fine after the rebuild, but upon restarting the box one more time to see what would happen, the partition had once again vanished.

Going over the dmesg output, it's clear the system can see the partition:
---
SCSI device sdm: 976773168 512-byte hdwr sectors (500108 MB)
sdm: Write Protect is off
sdm: Mode Sense: 00 3a 00 00
SCSI device sdm: drive cache: write back
SCSI device sdm: 976773168 512-byte hdwr sectors (500108 MB)
sdm: Write Protect is off
sdm: Mode Sense: 00 3a 00 00
SCSI device sdm: drive cache: write back
sdm: sdm1
sd 12:0:0:0: Attached scsi disk sdm
---

Any ideas?


Reply to: