Syncing during installation can prevent massive filesystem corruption
I have found that possibly the worst time to have a system lockup is during a
software installation or upgrade. I've found this out the hard way and wish to
suggest a way to make it a little safer - sync the disks frequently during the
If you select a bunch of packages with tasksel, or you do an update after not
having updated in a long time, you could be installing a couple hundred packages
that contain thousands, possibly tens of thousands of files. In my old
installation yesterday all my packages together consisted of 42,000 files, and I
didn't have a particularly large number of packages installed.
Think about what happens in the kernel when you write to a filesystem. Disk
writes go into the buffer cache and aren't committed to disk right away.
Suppose you delete a file and write a new one. The kernel creates a buffer
cache entry that reflects the state the disk should be in with the old file
deleted, and then copies the new file data into some cache blocks and creates an
in-memory inode that stores where those blocks will eventually be in the filesystem.
Eventually the cache fills up and the least-recently used blocks finallyg et
committed to disk. For a non-journaled filesystem like ext2, the order the data
gets committed won't bear any particular relation to the final data structure
desired. When this happens the filesystem that's actually on the drive is
temporarily in a corrupted state, and lots of your file data is missing - the
filesystem can only be considered correct if you take the buffer cache into
account, and that will be lost in the event of a crash or power failure.
If you're writing a lot of files, before all of the buffer cache gets committed
the byte pattern that's on the hard drive is temporarily in a highly corrupted
Now my sad experience. I was running a kernel before (2.4.14 for PowerPC) that
I guess must have been buggy because I would get sudden lockups from time to
time. It seemed to come when there was a lot of filesystem activity, such as
when doing an update. Progress of the update would halt, X11 would become
unresponsive, and I'd have to power off the machine. I couldn't ssh in to sync
When this happened yesterday it caused such horrendous corruption to my /, /var
and /usr partitions that I eventually gave up on repairing the damage and just
reformatted and reinstalled from scratch. Running fsck -y found and fixed
hundreds, if not thousands of errors. Yes, my friends, fsck fixed my
filesystems but good.
The problem I found once I could run fsck without complaint was that the data
content of a lot of files was just plain wrong. For example, when I tried to
resume the update (having already downloaded all the files), logrotate wouldn't
reinstall because dpkg claimed that the file /usr/sbin/logrotate was part of
What I eventually found was that the file /var/lib/dpkg/info/mailx.list didn't
list the files that belonged to dpkg anymore - it clearly contained logrotate's
list! There were several files like this. I found a couple of files in /etc
that erroneously contained some manner of typographical information instead of
At first I tried to fix the mess manually but after a while realized that I
wouldn't be likely to find all the files with bad contents and just wiped and
Now here's my suggestion:
When dpkg completes installing one package it should sync all the filesystems.
Also sync each time a file finishes downloading during the package downloads.
Frequently syncing the filesystems slows a machine down because you don't get
the benefit of the buffer cache. But fault-tolerance during an upgrade, when
you are making drastic changes to critical files, is of much more importance
A simple way to approach this before dpkg gets modified (or if the debian
developers choose not to accept my suggestion) would be to run a script like the
following in the background just before starting up dselect to do an install or
upgrade. Leave it running until the installation is complete:
Well I'm running kernel 2.4.18-5 now. Hopefully that won't crash on me anymore.
But I'm going to use my keep-syncing script during any future upgrades.
Michael D. Crawford
GoingWare Inc. - Expert Software Development and Consulting
Tilting at Windmills for a Better Tomorrow.
"I give you this one rule of conduct. Do what you will, but speak
out always. Be shunned, be hated, be ridiculed, be scared,
be in doubt, but don't be gagged."
-- John J. Chapman, "Make a Bonfire of Your Reputations"