[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#3055: filesystem corruption caused by process accounting




Package: acct
Architecture: i386
Version: 6.1-0

Short:
If the "log-file" used by the Linux process accounting exceeds the
free disk space, it is not possible to cleanly free up the space used
by this file. This is companioned with a several 100-line long
(=broken) output of "pstree".

This may not be a bug in Debian or a bug in "accton" but (likely) a
bug in the kernel.
[What relates to Debian: I suggest printing a big warning when turning
process accounting on.]


Long:
I'm running a quite large Linux-Box where the size of the file
`/var/account/pacct' easily exceeds the avaiable space in `/var'
(in my case, there are usally 56MB free).

The first time the problem occured was when trying out "userfs" and I
thought it was related to this software but indeed it was only
triggered by it (the included "ftpfs" calls "ftp" many times so the
log-file of the process-accounting grows fast).

The second time the problem ("0 bytes free on /var") came up, I was
not using "userfs" and I got the same symptons:

	I deleted the file `/var/account/pacct' but the space did not
	free up (around 50MB). `/var' remains "full" (0 bytes free
	even for root).
	And even worse, the space freed up by deleting huge files in
	/var/log was consumed by a rate of several dozen kb/s. Bummer!

	I was able to "umount" the filesystem containg `var' after
	switching of process accounting which results in massive
	filesystem-corruption. ("fsck" thought the filesystem was
	clean so I had to convince it with '-f').

The third time it happend I thought to be smart and switched of
process-accounting _before_ deleting the file but it didn't make any
difference (and that's the current state of my machine).


Two related bugs:
	- the output of "pstree" is totally messed up
	- wtmp is broken (this one is easy and may be a result of the
	  full `/var'-partition).

Output of "pstree":

init-+-afpd
     |-atalkd
     |-cron
     |-2*[getty]
     |-gpm
     |-inetd-+-3*[in.rshd---rimapd]
     |       `-in.telnetd---bash
     |-init-+-afpd
     |      |-atalkd
     |      |-cron
     |      |-getty
     |      |-gpm
     |      |-inetd-+-in.rshd---rimapd
     |      |       `-in.telnetd---bash
     |      |-init-+-afpd
     |      |      |-atalkd
     |      |      |-cron

The "recursion" (or how to call it) is 19 levels deep. The output of
"ps" is not affected.

Winfried


Here are the version-numbers of all (?) programms involved:

bash> uname -a
Linux ElFi 1.3.90 #1 Wed Apr 17 21:26:33 MET DST 1996 i586
Linux ElFi 1.99.2 #2-pre-2.0 Mon May 13 03:38:44 MET DST 1996 i586


bash> mount -V
mount: mount-2.5i
bash> ac -V
ac: GNU Accounting Utilities (beta release 6.1)

bash> rm --version
GNU fileutils 3.12
bash> mv --version
GNU fileutils 3.12

bash> tune2fs -V

bash> tune2fs -V
tune2fs 1.02, 16-Jan-96 for EXT2 FS 0.5b, 95/08/09
bash> fsck -V
Parallelizing fsck version 1.02 (16-Jan-96)
bash> fsck.ext2 -V
e2fsck 1.02, 16-Jan-96 for EXT2 FS 0.5b, 95/08/09

bash> pstree -V
pstree from psmisc version 11



Here's a log of what I did:

bash> cd account/
bash> dir
total 58123
-rw-r--r--   1 root     adm      56116224 Apr 19 20:15 pacct
-rw-r--r--   1 root     adm        737100 Apr 19 06:15 pacct.0
-rw-r--r--   1 root     adm        131146 Apr 18 06:17 pacct.1.gz
-rw-r--r--   1 root     adm       1027577 Apr 17 06:18 pacct.2.gz
-rw-r--r--   1 root     adm       1018483 Apr 16 06:15 pacct.3.gz
-rw-r--r--   1 root     adm         41218 Apr 15 06:15 pacct.4.gz
-rw-r--r--   1 root     adm         42538 Apr 14 06:15 pacct.5.gz
-rw-r--r--   1 root     adm        160368 Apr 13 06:15 pacct.6.gz
bash> rm pacct
bash> touch pacct
bash> dir
total 3106
-rw-r--r--   1 root     adm             0 Apr 19 20:15 pacct
-rw-r--r--   1 root     adm        737100 Apr 19 06:15 pacct.0
-rw-r--r--   1 root     adm        131146 Apr 18 06:17 pacct.1.gz
-rw-r--r--   1 root     adm       1027577 Apr 17 06:18 pacct.2.gz
-rw-r--r--   1 root     adm       1018483 Apr 16 06:15 pacct.3.gz
-rw-r--r--   1 root     adm         41218 Apr 15 06:15 pacct.4.gz
-rw-r--r--   1 root     adm         42538 Apr 14 06:15 pacct.5.gz
-rw-r--r--   1 root     adm        160368 Apr 13 06:15 pacct.6.gz
bash> df
Filesystem         1024-blocks  Used Available Capacity Mounted on
/dev/sda2              15863   12678     2366     84%   /
/dev/sda5              47575     117    45002      0%   /tmp
/dev/sda6              95167   95167        0    100%   /var
/dev/sda3              31727    6400    23689     21%   /var/local/dos
/dev/sda7             753333  688470    25952     96%   /usr
/dev/sdb2             444135  226724   194474     54%   /homes/elfi
/dev/sdb3              95183   48580    41688     54%   /tftpboot
/dev/sdb4            7961135 4615646  2932766     61%
/usr/local/wais
/dev/sdc3             290448   77488   212960     27%   /vol/mi/www
/dev/sdc2              96168   12834    83334     13%   /vol/mi/www/spinner_cache
/dev/sdc8              36666   33351     3315     91%   /vol/mi/www/logs/spinner
/dev/sdc9               4939      34     4905      1%   /vol/mi/www/logs/counter
calvados:/homes/calvados
                      402091  122661   271124     31%   /homes/calvados
calvados:/var/local/public
                      247109  136979    97369     58%   /var/local/public
sun1:/homes/sun1      975507  840400    37557     96%   /homes/sun1
sun1:/ElFi              5675    2687     2421     53%   /usr/local/mi2stn/sun1/ElFi
sun1:/elfi              1895     701     1005     41%   /usr/local/mi2stn/sun1/elfi
sunkaw:/homes/sunkaw  819214  454473   282821     62%   /homes/sunkaw
osi:/afs             72000000       0  72000000      0%   /afs

bash> tune2fs -l /dev/sda6
tune2fs 1.02, 16-Jan-96 for EXT2 FS 0.5b, 95/08/09
Filesystem magic number:  0xEF53
Filesystem state:         not clean
Errors behavior:          Continue
Inode count:              24576
Block count:              98288
Reserved block count:     4914
Free blocks:              0
Free inodes:              21955
First block:              1
Block size:               1024
Fragment size:            1024
Blocks per group:         8192
Fragments per group:      8192
Inodes per group:         2048
Last mount time:          Wed Apr 17 21:56:05 1996
Last write time:          Fri Apr 19 20:26:01 1996
Mount count:              7
Maximum mount count:      20
Last checked:             Wed Apr 10 09:27:27 1996
Check interval:           0
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)

[switched to a text-console, killed nearly everything and unmounted
/var]

root@ElFi:~> fsck /dev/sda6

[fsck says' its ok !]

root@ElFi:~> fsck -f /dev/sda6

[do a `cat /proc/kcore' to have the same effect which I had]

90042 -90043 -90044 -90045 -90046 -90047 -90048 -90049 -90050 -90051
-90052 -90053 -90054 -90055 -90056 -90057 -90058 -90059 -90060 -90061
-90062 -90063 -90064
-90065 -90066 -90067 -90068 -90069 -90070 -90071 -90072 -90073 -90074
-90075 -90076 -90077 -90078 -90079 -90080 -90081 -90082 -90083 -90084
-90085 -90086 -90087 -90088 -90089 -90090 -90091 -90092 -90093 -90094
-90095 -90096 -90097 -90098 -90099 -90100 -90
101 -90102 -90103 -90104 -90105 -90106 -90107 -90108 -90109 -90110
-90111 -90112 -90378 -90379 -90380 -90381 -90382 -90383 -90384 -90424
-90509 -90510 -90511 -90512.  FIXED
Free blocks count wrong for group 0 (0, counted=54).  FIXED
Free blocks count wrong for group 1 (0, counted=4542).  FIXED
Free blocks count wrong for group 2 (0, counted=6241).  FIXED
Free blocks count wrong for group 3 (0, counted=6202).  FIXED
Free blocks count wrong for group 4 (0, counted=6944).  FIXED
Free blocks count wrong for group 5 (16, counted=4253).  FIXED
Free blocks count wrong for group 6 (1, counted=7155).  FIXED
Free blocks count wrong for group 7 (0, counted=6194).  FIXED
Free blocks count wrong for group 8 (31, counted=6584).  FIXED
Free blocks count wrong for group 9 (7, counted=4004).  FIXED
Free blocks count wrong for group 10 (0, counted=2886).  FIXED
Free blocks count wrong for group 11 (0, counted=12).  FIXED
Free blocks count wrong (55, counted=55071).  FIXED

/dev/sda6: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sda6: 2617/24576 files (4.6% non-contiguous), 43217/98288 blocks
root@ElFi:~>

[the nightmare ends here, phew!]



Reply to: