[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: need help with approx-gc



On 20150509_1832-0600, Bob Proulx wrote:
> Paul E Condon wrote:
> > The following is just a few examples from kern.log:
> > May  8 11:32:49 cmn kernel: [4880283.861051] end_request: I/O error, dev sda, sector 16136192
> 
> Ouch!  You have a disk that is crying out for help.  Oh the pain and
> suffering of it!
> 
> > All of them have the same sector number. This is the sda drive,
> > which is formatted as ext4. Is there some way that the automatic
> > reallocate could the repaired by a forced manual fsck? and is the
> > rescue function on the netinst CD adequate for this?
> 
> I have often been in your same situation.  I would ensure that the
> backup is current and valid and then replace the disk.  That is me.  I
> have seen disks get worse very quickly after they have exhibited
> failures.  Modern disk controllers keep internal spares.  By the time
> the disk is showing errors externally the internal spares have
> probably all been consumed with other failures.
> 
> Problems like this will quickly make you a believer in RAID.  I pretty
> much raid everything these days just to avoid being in this
> situation.  In a RAID the bad disk would have already been kicked out
> of the raid array.  It would then be left running in degraded mode on
> the remaining drives.  The system would keep running without
> problems.  Replacing the failing drive and backfilling the raid array
> can all occur while the system is up and online.
> 
> > Not running SMART.
> > What Debian package provides smartctl ?
> 
>   apt-get install smartmontools
>   smartctl -l error /dev/sda
> 
> I expect that to show errors.
> 
>   smartctl -t short /dev/sda
>   sleep 120
>   smartctl -l selftest /dev/sda
> 
> I expect that to show errors.
> 
> > I don't think the following tests will make the reallocation problem
> > go away.
> 
> Nope.  Seems like a disk failure to me.
> 
> > I was planning to do something else this weekend, Oh well.
> 
> RAID.  I can't say enough good things about it in these situations.
> And backup.
> 
> BTW...  I have a low priority machine that is crying right now that
> SMART selftests are failing.  It hasn't gotten to the actual I/O
> failure error stage yet but it is only a matter of time.  It is a low
> priority machine so I haven't actually done anything yet.  It is still
> up and running.  But I have a disk and as soon as I get a few spare
> minutes this weekend I am going to go swap out the failing disk for
> another.  But tomorrow looks pretty busy for me.  I probably won't get
> to it until Monday.  And I have no stress about it because it is a
> raid and the other disk is healthy.  Plus backups are current.
> 
> Bob
Bob,

I have no doubt that raid is the right way to go, but my personal
situation is that I am working with old hardware, and I can't buy a
state of the art new computer unless prices suddenly crash. I'm quite
sure that I have daily backups going back to before I switched to
Jessie well before its release. I won't be able to get replacement
parts for the current box except by mail order, and I don't know if it
can hold more than one drive (It is an old Dell packaged in one of
their tiny desktop cases.)  As I write, I am thinking I should turn
off the failing machine, and learn to live without it for a few weeks.
It has been running approx and cups (It is the old box with Centronix
connector that figured in another thread here.) I have another old
Dell with a slightly bigger case. How many independent HDrives are
needed?

Thanks
-- 
Paul E Condon           
pecondon@mesanetworks.net


Reply to: