[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: bad downloads during apt-get upgrade



Steve Kleene wrote:
> On Mon, 24 Oct 2011 11:44:11 +0000 (UTC), I wrote:
> > That isn't a good error message.  I think your disk is failing.
> > Review your /var/log/syslog and look for error messages there.  I
> > expect you will see other errors logged there.

Really?  I thought *I* wrote that.  Wait, I did.  :-) I think you mail
attribution processing isn't configured right.

> I was afraid of that.  For what it's worth, there is nothing suspicious in
> syslog.  Mostly it's just a list of all the e-mails sent and received.  I do
> keep this machine very well backed up.

With backup you are in good shape.  You might try forcing a read of
every sector.  (e.g. dd if=/dev/sda of=/dev/null bs=4k or some such)
Because during normal use there will be only a few sectors that are
actually exercised.  And perhaps others will suggest better diagnostics.

> > You didn't say what type of media you are using.  Spinning disk?  SSD?
> > Other?
> 
> It's an old spinning disk (Maxtor DiamondMax Plus 8 6K040L0 40GB ATA/133
> HDD).  The date on it is 10/31/03.  It's sufficient for this machine's
> purpose.

Yes.  Plenty sufficient for many purposes.  No complaints here.  I
have several of those still running.

> >  # smartctl -H /dev/sda
> SMART overall-health self-assessment test result: PASSED

(shrug)  In my experience it isn't a great predictor of failure.  But
it often confirms failure.

>   40 51 01 af 49 c3 e2  Error: UNC 1 sectors at LBA = 0x02c349af = 46352815

Looks like an uncorrected read error.

> >  # smartctl -l selftest /dev/sda
> 
> Num  Test_Description    Status                  Remaining  LifeTime(hours) 
> LBA_of_first_error
> # 1  Short offline       Completed: read failure       60%     15032 46769249

After a short selftest it reported a read failure.

> I'm not sure how to interpret all of that output, but it looks bad.  Thanks
> for your help.

If this disk were a ship at see then it should be sending a distress
signal.

In the future you might consider installing the same smartmontools and
configuring /etc/smartd.conf to automatically run selftests on a
regular basis.  Perhaps something like this example from one of my
systems.

  # Monitor all attributes, enable automatic online data collection,
  # automatic Attribute autosave, and start a short self-test every day
  # between 2-3am, and a long self test Saturdays between 3-4am.
  # On failure run all installed scripts.
  # Ignore attribute 194 temperature change.
  # Ignore attribute 190 airflow temperature change.
  /dev/sda -a -o on -S on -s (S/../../[1-5]/03|L/../../6/03) -I 194 -I 190 -m root -M exec /usr/share/smartmontools/smartd-runner

With the above automatically running and being monitored then if there
is a selftest failure such as the one you are seeing the runner
scripts will email a warning message to root.  You will be notified of
the problem automatically.  Most of the time it works that way anyway
and most of the time it is a good warning of the problem.

Good luck!

Bob

Attachment: signature.asc
Description: Digital signature


Reply to: