[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: power interruption -> renamed USB disk



On Mon, 2011-03-07 at 23:47 -0600, Ron Johnson wrote:
> On 03/07/2011 11:25 PM, Ross Boylan wrote:
> > On Mon, 2011-03-07 at 18:02 -0600, Ron Johnson wrote:
> >> On 03/07/2011 03:08 PM, Ross Boylan wrote:
> >>> I have a SATA disk in an external USB docking station.  The computer it
> >>> is attached to is on UPS, but the external disk has only surge
> >>> protection.
> >>>
> >>> When the power goes out for a moment, the disk, which was /dev/sdc,
> >>> seems to come back as /dev/sdd.  The disk has a partition that is part
> >>> of an LVM volume group, and the file system on the disk is inaccessible.
> >>>
> >>> I've had to restart the system to get the disk back.
> >>>
> >>> Is there a better way (aside from getting  the disk on UPS)?
> >>>
> >>> Debian Lenny (mostly), 2.6.26-2-686 stock Debian kernel on Pentium 4
> >>> chip w/hyperthreading.  The disk uses the GPT partition format.
> >>>
> >>> Most recent incident:
> >>> <log>
> >>> #power fails
> >>> Mar  7 11:21:18 corn kernel: [182529.931888] ethfast: Link is Down
> >>> Mar  7 11:21:18 corn kernel: [182530.140155] usb 5-3: USB disconnect, address 4
> >>> Mar  7 11:21:18 corn kernel: [182530.140155] usb 5-3.1: USB disconnect, address 7
> >>> Mar  7 11:21:18 corn kernel: [182530.140155] usb 5-3.2: USB disconnect, address 8
> >>> Mar  7 11:21:18 corn kernel: [182530.140155] usblp0: removed
> >>> Mar  7 11:21:18 corn kernel: [182530.375701] usb 5-4: USB disconnect, address 5
> >>> #power resumes
> >>> Mar  7 11:21:21 corn kernel: [182533.121591] usb 5-3: new high speed USB device using ehci_hcd and address 9
> >>> Mar  7 11:21:21 corn kernel: [182533.179788] hub 5-0:1.0: unable to enumerate USB device on port 3
> >>> Mar  7 11:21:30 corn kernel: [182542.755370] __ratelimit: 4 messages suppressed
> >>> Mar  7 11:21:30 corn kernel: [182542.755379] Buffer I/O error on device dm-15, logical block 8210
> >>> Mar  7 11:21:30 corn kernel: [182542.755384] lost page write due to I/O error on dm-15
> >> [snip]
> >>> Mar  7 11:43:33 corn kernel: [183938.589854] nfsd: last server has exited
> >>> Mar  7 11:43:33 corn kernel: [183938.614722] nfsd: unexporting all filesystems
> >>> </log>
> >>>
> >>
> >> Why are you using device names instead of labels or UUIDs?
> > /dev/sdc (or sdd on power resume) is what the kernel is handing me; is
> > there a way to change that?
> >
> > My understanding is that LVM uses UUID's, which makes its failure to
> > recover a  bit more puzzling to me.
> >
> > I suspect I'm not fully understanding the question.
> 
> You appear to understand the question...   :)
> 
> Re-reading your post, I see, "for a moment".  How much of a moment? 
>From a fraction of a second to a minute.  Usually a fraction of a
second, though in the log above, about 3 seconds.
>   IOW, does the machine go down, 
IOW=?
The machine is on UPS and stays up.
> or just the external enclosure?  
The external drive bay is plugged into a surge protector; it loses
power.

I suppose one complication is that the docking base may draw some power
from the USB cable when the external power fails.

> If 
> just the enclosure, then the kernel probably thinks that the 
> external drives are still there.
I think that would be OK (except for some lost disk activity) if the
kernel didn't think the external disk had moved to a new location
(i.e., /dev/sdd).

I speculate that hotplug hasn't quite gotten (or never gets) the news
the sdc has disappeared, and when it detects a "new" drive it assigns it
the next available location, sdd.  But I'm not even sure what the
division of responsibility is between the kernel proper and the hotplug
system.  I thought hotplug was supposed to ensure that the same physical
device would end up with the same name every time.
> 
> That leads to these questions:
> 
> 1. Is the LV made up of both external *and* internal drives,
>     or just external?
The LV is in a VG that includes external and internal drives.  The
particular LVs that have trouble are entirely on the external drive.
> 
> 2. When the power flickers, do you do this:
>     # lvchange -an ${VG}/${LV}
>     # vgchange -an ${VG}
>     [unplug/wait/replug the enclosure]
>     # vgchange -ay ${VG}
>     # lvchange -ay ${VG}/${LV}
> 
No.  The vgchange -an seems problematic since essentially my entire
system is on the VG.  I think that means lvm would not let me deactivate
it; certainly if I succeeded I would be unable to do much of anything.
The VG is lvm2 format.

Ross



Reply to: