What did wreck the system? (Was "Re: Help: Login Failure"")
Dear List of Experts :)
I do confess having entered
(*) e2fsadm -L +10M /dev/vg_system/lv_var
without unmounting - and without getting those XXXXXXXX bar that usually
indicates progress / success. At that state, 98% of /var were full (used).
When I tried to login from a Thin Client two hours later, the above described
effect occured (as if the password was mistyped. Now, I remember that a
collegue described the same problem having occured before!).
Checking the disk usage with df, on /var allegedly 101% were used, the
absolute amount of bytes being used was a large negative number! However, we
still could browse /var at this state, while paging some log files lasted
strikingly long, so there was already a feeling of corruption.
Could it be, that at this very state - even without the stupidity of (*) - the
overcharged /var drive had lead to corrupt ldap data as there was no further
way to write to /var/ldap (or what the exact location is)?
Instead of backing up as much from /var as possible, we then unmounted /var
and gave it a "fsck -fy /dev/vg_system/lv_var" (I regret the 'y'). After
pages of fixing messages, we mounted /var again - and found only a lost+found
directory there. We managed to restore most of the data - but didn't get the
ldap to running.
What do you think now:
[ ] The system got wrecked when /var run out of memory.
[ ] The system got wrecked when (*) was done.
[ ] fsck couldn't cope with the situation as there was no free space on the
drive, which wrecked the system?
Now, we have RC-3 running - and this is not too bad - but for further
situations, one should learn some lessons. Please comment on those:
(1) As a matter of fact, /var run out of memory. This was due to two facts:
(i) Taking in consideration that squid takes 100 MB out of 150 MB partitioned
for /var, there is only 50 MB designed for logs AND ldap.
(ii) All logs go to tjener's /var - even logs from attached workstations
(this is what we believe, at least). Admitedly, our teacher's work station is
quite old and once per second says "kernel: i8253 count to high! resetting"!
You can imagine that this message filled up /var/log/messages!
=> LESSON: Make /var larger, filter the above mentioned message, trigger
logrotate on size rather than on time.
(2) A full /var/log corrupts ldap!
=> LESSON: Put those on different partitions, add /var/ldap (or what path it
is) by default to the list of backup directories! (This was not the case with
our system, was it, Klaus?)
(3) LESSON: Never try to enlarge mounted partitions!
(4) LESSON: Never do fsck with -y option set on a full partition (rather
(re)move some files first and omit -y switch)!
(5) LESSON: Always backup your system.
(6) LESSON: Don't use tight time slices for administration!
Please, feedback all your opinions: Where do you see aspects that should be
taken to bugzilla?
Regards
Ralf
Am Freitag 04 Juni 2004 12:57 schrieb Frank Weißer:
> The worst thing seems to be putting /usr on a too small LV, because to
> resize ext3, you need to umount it, but then you haven't got access to
> resize2fs any more :-(
As long as you stand back from installing X/KDE on your tjener, the given
800MB should suffice :)
Reply to: