[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: help needed to manage s390x host for ci.debian.net



Hi Phil,

On 13-02-2023 08:57, Philipp Kern wrote:
On 12.02.23 22:38, Paul Gevers wrote:
I have munin [1], but as said, I'm not a trained sysadmin. I don't know what I'm looking for if you ask "statistics on the network".

This is more of a software development / devops question than a sysadmin question, but alas.

I acknowledge that my reach out was broad and didn't only cover s390x.

What I am interested in is *application-level* logging on reconnects. Presumably the connection to RabbitMQ is outbound?

Our configuration can be seen here:
https://salsa.debian.org/ci-team/debian-ci-config/-/blob/master/cookbooks/rabbitmq/templates/rabbitmq.conf.erb

Is it tunneled? Does your application log somewhere when a reconnect happens? Does it say when it successfully connected?

I'd expect good software to log something like this:

[10:00:00] Connecting to broker "rabbitmq.debci.debian.net:12345"...
[10:00:05] Connected to broker "rabbitmq.debci.debian.net:12345".

And also:

[10:00:00] Connecting to broker "rabbitmq.debci.debian.net:12345"...
[10:00:01] Connection to broker "rabbitmq.debci.debian.net:12345" failed: Connection refused

@terceiro; I haven't seen these kind of logs on the worker hosts. Do you know if they exist or if we can generate them?

I think I'm seeing something on the main host.
admin@ci-master:/var/log/rabbitmq$ sudo grep 148.100.88.163 rabbit@ci-master.log | grep -v '\[info\]' | grep -v '\[warning\]' 2023-02-14 00:00:37.522 [error] <0.30951.85> closing AMQP connection <0.30951.85> (148.100.88.163:49540 -> 10.1.14.198:5671): 2023-02-14 02:27:56.050 [error] <0.15184.87> closing AMQP connection <0.15184.87> (148.100.88.163:49988 -> 10.1.14.198:5671): 2023-02-14 02:36:05.496 [error] <0.17479.87> closing AMQP connection <0.17479.87> (148.100.88.163:57098 -> 10.1.14.198:5671): 2023-02-14 04:06:13.869 [error] <0.16105.88> closing AMQP connection <0.16105.88> (148.100.88.163:42984 -> 10.1.14.198:5671): 2023-02-14 04:15:27.696 [error] <0.19038.88> closing AMQP connection <0.19038.88> (148.100.88.163:56650 -> 10.1.14.198:5671): 2023-02-14 20:05:38.702 [error] <0.23586.97> closing AMQP connection <0.23586.97> (148.100.88.163:34278 -> 10.1.14.198:5671):

and a lot more warnings (220 times in 20 hours) as well; like:
2023-02-14 20:05:09.011 [warning] <0.20860.97> closing AMQP connection <0.20860.97> (148.100.88.163:45624 -> 10.1.14.198:5671, vhost: '/', user: 'guest'):

And a lot (around 544) (obviously I don't know if that's only or even includes the s390x host):
client unexpectedly closed TCP connection

Paul

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


Reply to: