[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#599203: More information



Oops, attaching the missing aptitiude log and syslog of space1.



Original Text (with line-wrapping):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
An upgrade (os-prober 1.35 -> 1.39) corrupted 3.3 TB of data on our SAN.

I was upgrading the host space1 and the data corruption occurred on
space2. An install script of os-prober tried mounting as read-only a
SAN volume which was already mounted on space2. That volume (on
sapce2) was in production use so EXT3-fs (on space1) concluded that
the journal was inconsistent, re-mounted as writable and performed a
"recovery".

The mount on space2 became unavailable bringing the production host
down. Re-mounting failed. After rebooting space2 fsck was required on
the affected partition. It ran for many hours and found a huge number
errors. Probably more than 10,000 errors. Then I was able to mount the
volume and saw that our data was turned into gray goo: parts of system
prel scripts were replaced by binary chunks, databases and web servers
would not start. I had 30 containers in production. Some actually
booted despite major sporadic data corruption in them.

My fellow system administrator from another department on campus said
that their distribution (CentOS) does not run install scripts. As he
worded it - Debian ended-up managing your SAN for you.

The reason why I got os-prober was the change in Debian's policy to
install all recommended packages and os-prober was recommended by
Grub. I am not sure why the data corruption did not happed when I
upgraded to Squeeze a month ago (grub-common 1.96+20080724-16 ->
1.98-1).


I'm attaching the aptitiude log and syslog of space1.

The root of the problem is also described in bug #556739
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=556739). The author
predicts filesystem corruption and data loss back in 2009.

Alex
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



-- 
----------------------------------------------------------------
Aleksandr Levchuk
Bioinformatics Systems and Databases

http://facility.bioinformatics.ucr.edu/people/aleksandr-levchuk
Cell Phone: (951) 368-0004
Lab Phone: (951) 905-5232

Institute for Integrative Genome Biology
University of California, Riverside
---------------------------------------------------------------

Attachment: bug599203_syslog.log
Description: Binary data

Attachment: bug599203_aptitude.log
Description: Binary data


Reply to: