Bug#599203: os-prober: Causes Data Corruption on a SAN setup; Mounts SAN volumes that are already mounted on a different host.
An upgrade (os-prober 1.35 -> 1.39) corrupted 3.3 TB of data on our SAN.
I was upgrading the host space1 and the data corruption occurred on space2. An install script of os-prober tried mounting as read-only a SAN volume which was already mounted on space2. That volume (on sapce2) was in production use so EXT3-fs (on space1) concluded that the journal was inconsistent, re-mounted as writable and performed a "recovery".
The mount on space2 became unavailable bringing the production host down. Re-mounting failed. After rebooting space2 fsck was required on the affected partition. It ran for many hours and found a huge number errors. Probably more than 10,000 errors. Then I was able to mount the volume and saw that our data was turned into gray goo: parts of system prel scripts were replaced by binary chunks, databases and web servers would not start. I had 30 containers in production. Some actually booted despite major sporadic data corruption in them.
My fellow system administrator from another department on campus said that their distribution (CentOS) does not run install scripts. As he worded it - Debian ended-up managing your SAN for you.
The reason why I got os-prober was the change in Debian's policy to install all recommended packages and os-prober was recommended by Grub. I am not sure why the data corruption did not happed when I upgraded to Squeeze a month ago (grub-common 1.96+20080724-16 -> 1.98-1).
I'm attaching the aptitiude log and syslog of space1.
The root of the problem is also described in bug #556739 (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=556739). The author predicts filesystem corruption and data loss back in 2009.
-- System Information:
Debian Release: squeeze/sid
APT prefers stable
APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 184.108.40.206+openvz-budarin (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages os-prober depends on:
ii libc6 2.11.2-2 Embedded GNU C Library: Shared lib
os-prober recommends no packages.
os-prober suggests no packages.
-- no debconf information