[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: / 100% used



Quoting Beco (rcb@beco.cc):

> Hi guys,
> I'll report actions in order now:
> 
> - Upgrade from wheezy to jessie yesterday nigh. No problems during upgrade.

I would question the wisdom of performing a distribution upgrade
remotely when you normally have physical access to the machine.

"No problems during upgrade" just means the software installed,
not that it's all going to work together faultlessly.

> - Today email from users telling system is not usable (but online and you could
> login)

In another thread, I have reported a fresh jessie installation that
would boot 50% of the time into a system that gave no console access,
but where the system was partially usuable through ssh.

> - After inquiring the logs, I saw syslog with what seems to be 3 problems,
> spamming the logs
>   * networkmanager reporting wpa_supplicant 
>   * wpa_supplicant trying to setup wlan0
>   * kernel attempting to load rt2860.bin

In my case, for whatever reason, the binfmt_misc kernel module didn't
get loaded automatically. Being a laptop, there wasn't time enough for
the logs to fill up with
host systemd[1]: Looping too fast. Throttling execution a little.
messages every 3 seconds. (I now load it from /etc/modules.)

> Status: system could not create any files. I could not apt-get install lshw,
> for instance. Users (students) could not run "gcc" to compile, due to lack of
> resources.
> 
> System is remote, using "ssh" to solve problems.
> 
> 1st action: a loop created with a bash command:
> # while true; do echo clean syslog; cat /dev/null > syslog ; sleep 10; done
> 
> This would allow me to see whats happening. I could install lshw.

[snipped diagnostics determining it has a wireless device]

Presumably wheezy wasn't using the wireless device, but you (and other
users) were always connecting through eth0. Presumably you were also
configuring eth0 with network-manager, hence its inclusion in jessie
after the upgrade.

> 2nd action:
> 
>  # apt-get remove wpasupplicant
> 
> System stabilized. I let all my ssh sessions on, and went to grab a bite. (Not
> lunched today yet).

By "stabilized", I assume you mean that the logs were no longer
filling up and so you syslog cleaner was no longer necessary.

At this point, you've destroyed your network configuration, but are
unaware of it unless you try to establish new connections. Your ssh
connections are running on their original file-descriptors.

> ----------------- OH BOY SECTION!!
> 
> Just to come back and see all sessions kicked out. System not accessible
> anymore.
> 
> Ping was ok. But no connection. Ping was problably ok because a server before
> mine should be answering the pings.
> 
> Any ssh give me time out! Oh, boy! So the worst came true: I needed physical
> access to a server in my room, sunday night. There I went. Drove there, all
> dark and empty. Sysadmins life.
> 
> There I saw the Network-Manager icon (KDE) was not active. Well, I downloaded
> the wpasupplicant package using my notebook, passed via pendrive to the server,
> reinstalled it.

As Pascal explained, network-manager wouldn't be installed without
wpasupplicant installed. Not having used network-manager, I wouldn't
know the steps necessary to bring it back up, but at the very least it
would need reinstalling. The config files should still be there.

> 3rd action:
> 
> Nothing working. Tried ifup, not recognized. Then I remember I had commented
> some lines before, in /etc/network/interfaces. I dis-commented this line:
> 
> ---
> iface eth0 inet dhcp
> ---
> 
> And #ifup eth0
> 
> All running. I needed to get out there, because the gate's keeper was not
> happy.

So now, network-manager and wpasupplicant are no longer required for
networking to run.

> Back to my home, now I think the server is running "ok".
> 
> I need to figure out what is wrong, and if there is a better solution. Because
> I can't make sense of what just happened as reported above.
> 
> In my understanding, I was supposed to let Network-manager run the game. Not
> wpa_supplicant, and not "ifup".

Well, its your choice whether you use network-manager or not. My
"server" at home runs eth0 through wicd, but that's just for
uniformity across all my machines: laptops and wired and wireless
desktops. If the ethernet breaks down, I can just plug in a
wifi USB.

> Now I don't know where is Network-manager. wpa_supplicant was gone, and back
> again. System is stable. And "ifup" is configured.

I would expect to now see   dpkg -l   report
rc network-manager ...
ii wpasupplicant ...

> Last action:  (Pascal's suggestion)
> I added the referred [firmware -> module] to the blacklist
> 
> Just in case

Fair enough.

> So, where I am now after such modifications?!
> 
> Tomorrow I'll have full access to the server, and hopefully no gatekeeper in my
> back, so I'll have some time.
> 
> Should I research for Network-Manager? Comment back this "ifup" ?

I would purge network-manager and wpasupplicant so you don't confuse
the system or yourself, and rely on /e/n/i to run your wired network.
Make sure you can establish new ssh connections before going home.

> Daemon.log
> 
>   31471 Jul  3 07:00:24 beco dhclient: send_packet: Operation not permitted
>    31472 Jul  3 07:00:44 beco dhclient: DHCPREQUEST on eth0 to 10.0.0.1 port 67
>    31473 Jul  3 07:00:44 beco dhclient: send_packet: Operation not permitted
>    31474 Jul  3 07:00:53 beco dhclient: DHCPREQUEST on eth0 to 255.255.255.255
> port 67
>    31475 Jul  3 07:00:53 beco dhclient: DHCPACK from 10.0.0.1
>    31476 Jul  3 07:00:53 beco dhclient: bound to 10.0.3.2 -- renewal in 3534
> seconds.
>    31477 Jul  3 07:59:47 beco dhclient: DHCPREQUEST on eth0 to 10.0.0.1 port 67
> 
> 15 GB of this madness
> 
> I wonder if this program really need to report such failure so many times and
> in 4 logs simultaneously. I know redundancy is good, but :
> 16GB syslog + 5 GB kern.log + 5 GB messages + 15 GB daemon.log = 41 GB!
> 
> Thats a way to say: Hey! I can't install this firmware! (I'm afraid to open
> root's email and find there a message about it also!)
> 
> I was walking down the street to grab a bite and a stranger yelled at me: Hey!
> Install firmware rt2860.bin!
> (True fact! Believe me!)

Unfortunately, computers have no common sense! Email transfers are
expected to fail, so subsequent attempts are made after ever
increasing  periods of time. This is very frustrating on a laptop and
is why I have a sudo command to remove the lockfiles and kick exim.
But for servers, it's obviously the Right Way.

However, I'd be even more frustrated if a failure to make an airport
wifi connection meant that the system would wait for any length of
time before retrying.

Cheers,
David.


Reply to: