Re: / 100% used
On Sun, 5 Jul 2015 16:19:36 -0300
Beco <rcb@beco.cc> wrote:
> Hi guys,
>
> I need some help regarding this problem.
>
> Yesterday I upgraded from Wheezy to Jessie. Today I got an email
> saying the user could not create a "tmp" file to do anything.
>
> I checked the filesystem with:
>
> ------------
> # df -h
> Filesystem Size Used Avail Use% Mounted on
>
> /dev/sda1 46G 46G 0 100% /
>
> udev 10M 0 10M 0% /dev
>
> tmpfs 789M 82M 708M 11% /run
>
> tmpfs 2.0G 4.0K 2.0G 1% /dev/shm
>
> tmpfs 5.0M 4.0K 5.0M 1% /run/lock
>
> tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
>
> /dev/sda3 864G 4.0G 816G 1% /home
>
> tmpfs 395M 0 395M 0% /run/user/1000
>
> tmpfs 395M 0 395M 0% /run/user/1340
> tmpfs 395M 0 395M 0% /run/user/1328
> tmpfs 395M 0 395M 0% /run/user/1360
> ------------
>
> Also tried to find what was using such huge space with:
>
>
> ------------
> # find / -xdev -type f -size +200M -exec ls -lh {} \;
> -rw-r----- 1 root adm 4.2G Jul 5 16:05 /var/log/messages
> -rw-r----- 1 root adm 5.6G Jul 5 16:05 /var/log/kern.log
> -rw-r----- 1 root adm 16G Jul 5 16:05 /var/log/syslog
> -rw-r----- 1 root adm 9.6G Jul 5 16:05 /var/log/daemon.log
> ------------
>
> I'm not convinced that only 16G of syslog is my whole problem. But Id
> start with that, if I can make the system usable again. So I check
> the log just to see, over and over, this messages:
>
> ------------
> 1 Jul 5 15:35:58 beco kernel: [66691.859352] ieee80211 phy0:
> rt2x00lib_request_firmware: Info - Loading firmware file 'rt2860.bin'
> 2 Jul 5 15:35:58 beco kernel: [66691.859358] rt2800pci
> 0000:03:00.0: firmware: failed to load rt2860.bin (-2)
> 3 Jul 5 15:35:58 beco kernel: [66691.859359] rt2800pci
> 0000:03:00.0: Direct firmware load failed with error -2
> 4 Jul 5 15:35:58 beco kernel: [66691.859361] rt2800pci
> 0000:03:00.0: Falling back to user helper
> ...
>
> Jul 5 15:35:58 beco systemd-udevd[16867]: failed to execute
> '/lib/udev/socket:@/org/freedesktop/hal/udev_event'
> 'socket:@/org/freedesktop/hal/u
> dev_event': No such file or directory
> ...
> 24 Jul 5 15:35:58 beco wpa_supplicant[2357]: Could not set
> interface wlan0 flags (UP): Cannot allocate memory
> 25 Jul 5 15:35:58 beco wpa_supplicant[2357]: nl80211: Could not set
> interface 'wlan0' UP
> 26 Jul 5 15:35:58 beco wpa_supplicant[2357]: Could not set
> interface wlan0 flags (UP): Cannot allocate memory
> 27 Jul 5 15:35:58 beco wpa_supplicant[2357]: WEXT: Could not set
> interface 'wlan0' UP
> 28 Jul 5 15:35:58 beco wpa_supplicant[2357]: wlan0: Failed to
> initialize driver interface
> ...
> 30 Jul 5 15:35:58 beco NetworkManager[1812]: <error>
> [1436121358.001526]
> [supplicant-manager/nm-supplicant-interface.c:856]
> interface_add_cb(): (w lan0): error adding interface: wpa_supplicant
> couldn't grab this interface. 31 Jul 5 15:35:58 beco
> NetworkManager[1812]: <info> (wlan0): supplicant interface state:
> starting -> down 32 Jul 5 15:35:58 beco systemd-udevd[16870]: failed
> to execute '/lib/udev/socket:@/org/freedesktop/hal/udev_event'
> 'socket:@/org/freedesktop/hal/u
> dev_event': No such file or directory
> ...
> ------------
>
> This keeps going. You can see above, its only from 15h35min58s.
> During one single minute I got more than 7000+ lines added into
> syslog.
>
> It seems a variety of problems. How can I get out from this EMERGENCY
> (system is not usable), and then with more calm and time, figure out
> what to do in the long term?
>
Hopefully you will get several suggestions.
My first approach would be to get /var out of /, as you say, you
need a quick way to actually get the system bootable, to try to make
sense of what is happening.
Your problem is why servers generally have a separate /var partition,
though not usually workstations as they generate far fewer logs when
working normally. If a separate /var fills, the logging system should
be able to deal with it, it is other parts of the OS which cannot deal
with a full /.
I would boot with a rescue/live distribution, then:
- back up /home to at least two locations e.g. optical disc and external
hard drive.
- resize /home to free 50GB or more of unpartitioned space. This can be
recovered and returned to /home once this is fixed. Create a
partition, create an appropriate filesystem (presumably ext4) and
alter the installed distribution's /etc/fstab to mount this on /var.
- delete as much as you feel necessary from the existing /var i.e. keep
enough information to return to if you run into trouble and want to
examine more logs later. Once rebooted, the old /var will not be
accessible, but it will not be overwritten. Free at least a gigabyte
of space, which will be more than enough for a / without /var on a
temporary basis.
- cross your fingers and reboot. Hopefully the system will run well
enough for you to make sure of the problem(s) and fix it/them.
- once fixed, comment out the new /var /etc/fstab entry and reboot. The
old /var should now be used again, and the temporary /var partition
can be deleted, and /home resized to recover the space. Or leave the
space unpartitioned, and don't bother reincorporating it into /home
until /home runs out of space, when you will be looking for a new
drive anyway. Or stay with a separate /var...
I've possibly forgotten a few details, as I haven't done this kind of
job for some years, but the outline is sound. It might be possible to
just plug in a USB stick and alter /etc/fstab to mount /var there
temporarily, but I can foresee many little gotchas. I'd stick with a
real hard drive partition.
--
Joe
Reply to: