[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: / 100% used



On Sun, 5 Jul 2015 16:19:36 -0300
Beco <rcb@beco.cc> wrote:

> Hi guys,
> 
> I need some help regarding this problem.
> 
> Yesterday I upgraded from Wheezy to Jessie. Today I got an email
> saying the user could not create a "tmp" file to do anything.
> 
> I checked the filesystem with:
> 
> ------------
> # df -h
> Filesystem      Size  Used Avail Use% Mounted on
> 
> /dev/sda1        46G   46G     0 100% /
> 
> udev             10M     0   10M   0% /dev
> 
> tmpfs           789M   82M  708M  11% /run
> 
> tmpfs           2.0G  4.0K  2.0G   1% /dev/shm
> 
> tmpfs           5.0M  4.0K  5.0M   1% /run/lock
> 
> tmpfs           2.0G     0  2.0G   0% /sys/fs/cgroup
> 
> /dev/sda3       864G  4.0G  816G   1% /home
> 
> tmpfs           395M     0  395M   0% /run/user/1000
> 
> tmpfs           395M     0  395M   0% /run/user/1340
> tmpfs           395M     0  395M   0% /run/user/1328
> tmpfs           395M     0  395M   0% /run/user/1360
> ------------
> 
> Also tried to find what was using such huge space with:
> 
> 
> ------------
> # find / -xdev -type f -size +200M -exec ls -lh {} \;
> -rw-r----- 1 root adm 4.2G Jul  5 16:05 /var/log/messages
> -rw-r----- 1 root adm 5.6G Jul  5 16:05 /var/log/kern.log
> -rw-r----- 1 root adm 16G Jul  5 16:05 /var/log/syslog
> -rw-r----- 1 root adm 9.6G Jul  5 16:05 /var/log/daemon.log
> ------------
> 
> I'm not convinced that only 16G of syslog is my whole problem. But Id
> start with that, if I can make the system usable again. So I check
> the log just to see, over and over, this messages:
> 
> ------------
>   1 Jul  5 15:35:58 beco kernel: [66691.859352] ieee80211 phy0:
> rt2x00lib_request_firmware: Info - Loading firmware file 'rt2860.bin'
>    2 Jul  5 15:35:58 beco kernel: [66691.859358] rt2800pci
> 0000:03:00.0: firmware: failed to load rt2860.bin (-2)
>    3 Jul  5 15:35:58 beco kernel: [66691.859359] rt2800pci
> 0000:03:00.0: Direct firmware load failed with error -2
>    4 Jul  5 15:35:58 beco kernel: [66691.859361] rt2800pci
> 0000:03:00.0: Falling back to user helper
> ...
> 
> Jul  5 15:35:58 beco systemd-udevd[16867]: failed to execute
> '/lib/udev/socket:@/org/freedesktop/hal/udev_event'
> 'socket:@/org/freedesktop/hal/u
>     dev_event': No such file or directory
> ...
>   24 Jul  5 15:35:58 beco wpa_supplicant[2357]: Could not set
> interface wlan0 flags (UP): Cannot allocate memory
>   25 Jul  5 15:35:58 beco wpa_supplicant[2357]: nl80211: Could not set
> interface 'wlan0' UP
>   26 Jul  5 15:35:58 beco wpa_supplicant[2357]: Could not set
> interface wlan0 flags (UP): Cannot allocate memory
>   27 Jul  5 15:35:58 beco wpa_supplicant[2357]: WEXT: Could not set
> interface 'wlan0' UP
>   28 Jul  5 15:35:58 beco wpa_supplicant[2357]: wlan0: Failed to
> initialize driver interface
> ...
>   30 Jul  5 15:35:58 beco NetworkManager[1812]: <error>
> [1436121358.001526]
> [supplicant-manager/nm-supplicant-interface.c:856]
> interface_add_cb(): (w lan0): error adding interface: wpa_supplicant
> couldn't grab this interface. 31 Jul  5 15:35:58 beco
> NetworkManager[1812]: <info> (wlan0): supplicant interface state:
> starting -> down 32 Jul  5 15:35:58 beco systemd-udevd[16870]: failed
> to execute '/lib/udev/socket:@/org/freedesktop/hal/udev_event'
> 'socket:@/org/freedesktop/hal/u
>     dev_event': No such file or directory
> ...
> ------------
> 
> This keeps going. You can see above, its only from 15h35min58s.
> During one single minute I got more than 7000+ lines added into
> syslog.
> 
> It seems a variety of problems. How can I get out from this EMERGENCY
> (system is not usable), and then with more calm and time, figure out
> what to do in the long term?
> 

Hopefully you will get several suggestions.

My first approach would be to get /var out of /, as you say, you
need a quick way to actually get the system bootable, to try to make
sense of what is happening.

Your problem is why servers generally have a separate /var partition,
though not usually workstations as they generate far fewer logs when
working normally. If a separate /var fills, the logging system should
be able to deal with it, it is other parts of the OS which cannot deal
with a full /.

I would boot with a rescue/live distribution, then:

- back up /home to at least two locations e.g. optical disc and external
  hard drive.
- resize /home to free 50GB or more of unpartitioned space. This can be
  recovered and returned to /home once this is fixed. Create a
  partition, create an appropriate filesystem (presumably ext4) and
  alter the installed distribution's /etc/fstab to mount this on /var.
- delete as much as you feel necessary from the existing /var i.e. keep
  enough information to return to if you run into trouble and want to
  examine more logs later. Once rebooted, the old /var will not be
  accessible, but it will not be overwritten. Free at least a gigabyte
  of space, which will be more than enough for a / without /var on a
  temporary basis.
- cross your fingers and reboot. Hopefully the system will run well
  enough for you to make sure of the problem(s) and fix it/them.
- once fixed, comment out the new /var /etc/fstab entry and reboot. The
  old /var should now be used again, and the temporary /var partition
  can be deleted, and /home resized to recover the space. Or leave the
  space unpartitioned, and don't bother reincorporating it into /home
  until /home runs out of space, when you will be looking for a new
  drive anyway. Or stay with a separate /var...

I've possibly forgotten a few details, as I haven't done this kind of
job for some years, but the outline is sound. It might be possible to
just plug in a USB stick and alter /etc/fstab to mount /var there
temporarily, but I can foresee many little gotchas. I'd stick with a
real hard drive partition.

-- 
Joe



Reply to: