[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#971686: (no subject)



Hi Ben,

Thanks for your reply. My swap is (was) on USB stick. I added it as I
had something large to crunch and wanted to be sure I have enough
memory. Today I had more errors, server froze solid while waiting for
swap memory to respond (I believe). Not I removed swap memory completely.

Errors I encountered today:

[Tue Oct  6 19:24:06 2020] INFO: task Compositor:1413596 blocked for
more than 120 seconds.
[Tue Oct  6 19:24:06 2020]       Tainted: P    B      OE
5.7.0-0.bpo.2-amd64 #1 Debian 5.7.10-1~bpo10+1
[Tue Oct  6 19:24:06 2020] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Oct  6 19:24:06 2020] Compositor      D    0 1413596 3662482 0x00000000
[Tue Oct  6 19:24:06 2020] Call Trace:
[Tue Oct  6 19:24:06 2020]  __schedule+0x2dd/0x710
[Tue Oct  6 19:24:06 2020]  ? do_futex+0xca/0xb60
[Tue Oct  6 19:24:06 2020]  schedule+0x40/0xb0
[Tue Oct  6 19:24:06 2020]  rwsem_down_read_slowpath+0x3e3/0x510
[Tue Oct  6 19:24:06 2020]  do_page_fault+0x4da/0x5d0
[Tue Oct  6 19:24:06 2020]  page_fault+0x34/0x40
[Tue Oct  6 19:24:06 2020] RIP: 0033:0x556b1dbd9174
[Tue Oct  6 19:24:06 2020] Code: Bad RIP value.
[Tue Oct  6 19:24:06 2020] RSP: 002b:00007f12f57bd600 EFLAGS: 00010206
[Tue Oct  6 19:24:06 2020] RAX: 00007f12cd627000 RBX: 0000000000000038
RCX: 00007f12cd601040
[Tue Oct  6 19:24:06 2020] RDX: 00007f12cd6003c8 RSI: 00007f12cd600bf0
RDI: 00007f1309300180
[Tue Oct  6 19:24:06 2020] RBP: 00007f1309300000 R08: 00007f1309300018
R09: 00007f12cc1472e0
[Tue Oct  6 19:24:06 2020] R10: 00007f12cc1476f0 R11: 00007f12f57bd710
R12: 0000000000000040
[Tue Oct  6 19:24:06 2020] R13: 00007f1309300178 R14: 00007f1309300018
R15: 00007f12c945a220
[Tue Oct  6 19:24:06 2020] INFO: task electrumx_serve:3849071 blocked
for more than 120 seconds.
[Tue Oct  6 19:24:06 2020]       Tainted: P    B      OE
5.7.0-0.bpo.2-amd64 #1 Debian 5.7.10-1~bpo10+1
[Tue Oct  6 19:24:06 2020] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Oct  6 19:24:06 2020] electrumx_serve D    0 3849071    768 0x00004000
[Tue Oct  6 19:24:06 2020] Call Trace:
[Tue Oct  6 19:24:06 2020]  __schedule+0x2dd/0x710
[Tue Oct  6 19:24:06 2020]  schedule+0x40/0xb0
[Tue Oct  6 19:24:06 2020]  rwsem_down_read_slowpath+0x3e3/0x510
[Tue Oct  6 19:24:06 2020]  do_page_fault+0x4da/0x5d0
[Tue Oct  6 19:24:06 2020]  page_fault+0x34/0x40
[Tue Oct  6 19:24:06 2020] RIP: 0033:0x7fd5ed581f43
[Tue Oct  6 19:24:06 2020] Code: Bad RIP value.
[Tue Oct  6 19:24:06 2020] RSP: 002b:00007fd4ee5d5df0 EFLAGS: 00010246
[Tue Oct  6 19:24:06 2020] RAX: 00007fd4cc0a2958 RBX: 00007fd4ee5d5f00
RCX: 0000000000000d80
[Tue Oct  6 19:24:06 2020] RDX: 00007fd45a7ce6da RSI: 00007fd4a8045140
RDI: 00007fd4ee5d5e00
[Tue Oct  6 19:24:06 2020] RBP: 00007fd4ee5d5eb0 R08: 00007fd4ee5d5e20
R09: 00007fd4cc06d0e0
[Tue Oct  6 19:24:06 2020] R10: 00007fd4cc0008d0 R11: 00007fd4cc132df0
R12: 00007fd4cc06d0e0
[Tue Oct  6 19:24:06 2020] R13: 00007fd45a7ce6da R14: 0000000000000d7b
R15: 00007fd4ee5d5ec0
[Tue Oct  6 19:24:06 2020] INFO: task electrumx_serve:3849085 blocked
for more than 120 seconds.
[Tue Oct  6 19:24:06 2020]       Tainted: P    B      OE
5.7.0-0.bpo.2-amd64 #1 Debian 5.7.10-1~bpo10+1
[Tue Oct  6 19:24:06 2020] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Oct  6 19:24:06 2020] electrumx_serve D    0 3849085    768 0x00004000
[Tue Oct  6 19:24:06 2020] Call Trace:
[Tue Oct  6 19:24:06 2020]  __schedule+0x2dd/0x710
[Tue Oct  6 19:24:06 2020]  schedule+0x40/0xb0
[Tue Oct  6 19:24:06 2020]  rwsem_down_read_slowpath+0x3e3/0x510
[Tue Oct  6 19:24:06 2020]  do_page_fault+0x4da/0x5d0
[Tue Oct  6 19:24:06 2020]  page_fault+0x34/0x40
[Tue Oct  6 19:24:06 2020] RIP: 0033:0x7fd5ed581c43
[Tue Oct  6 19:24:06 2020] Code: Bad RIP value.
[Tue Oct  6 19:24:06 2020] RSP: 002b:00007fd49e7facb0 EFLAGS: 00010202
[Tue Oct  6 19:24:06 2020] RAX: 00007fd49e7fad60 RBX: 00007fd49e7fad60
RCX: 0000000000000030
[Tue Oct  6 19:24:06 2020] RDX: 00007fd49e7fad50 RSI: 00007fd49e7fad90
RDI: 00007fd49e7fad60
[Tue Oct  6 19:24:06 2020] RBP: 00007fd45bb3f9bf R08: 00007fd49e7fad50
R09: 00007fd49e7fadb0
[Tue Oct  6 19:24:06 2020] R10: 0000000000000001 R11: 0000000000000000
R12: 00007fd49e7fad50
[Tue Oct  6 19:24:06 2020] R13: 00007fd49e7fad50 R14: 0000000001a8f4e0
R15: 00007fd49e7fad90
[Tue Oct  6 19:24:31 2020] systemd[1]: kresd@11.service: State
'final-sigterm' timed out. Killing.
[Tue Oct  6 19:24:31 2020] systemd[1]: kresd@11.service: Killing process
3741645 (kresd) with signal SIGKILL.
[Tue Oct  6 19:24:31 2020] systemd[1]: kresd@11.service: Killing process
1335180 (kresd) with signal SIGKILL.
[Tue Oct  6 19:24:31 2020] systemd[1]: kresd@12.service: State
'final-sigterm' timed out. Killing.
[Tue Oct  6 19:24:31 2020] systemd[1]: kresd@12.service: Killing process
3741646 (kresd) with signal SIGKILL.
[Tue Oct  6 19:24:31 2020] systemd[1]: kresd@12.service: Killing process
1335155 (kresd) with signal SIGKILL.
[Tue Oct  6 19:24:35 2020] systemd[1]: systemd-journald.service: Main
process exited, code=killed, status=6/ABRT
[Tue Oct  6 19:24:35 2020] systemd[1]: systemd-journald.service: Failed
with result 'watchdog'.
[Tue Oct  6 19:24:35 2020] systemd[1]: systemd-journald.service:
Scheduled restart job, restart counter is at 1.
[Tue Oct  6 19:24:35 2020] systemd[1]: Stopping Flush Journal to
Persistent Storage...
[Tue Oct  6 19:24:35 2020] systemd[1]: systemd-journal-flush.service:
Succeeded.
[Tue Oct  6 19:24:35 2020] systemd[1]: Stopped Flush Journal to
Persistent Storage.
[Tue Oct  6 19:24:35 2020] systemd[1]: Stopped Journal Service.
[Tue Oct  6 19:24:35 2020] systemd[1]: Starting Journal Service...
[Tue Oct  6 19:24:41 2020] systemd[1]: kresd@11.service: Processes still
around after final SIGKILL. Entering failed mode.
[Tue Oct  6 19:24:41 2020] systemd[1]: kresd@11.service: Failed with
result 'timeout'.
[Tue Oct  6 19:24:41 2020] systemd[1]: kresd@11.service: Unit process
1335180 (kresd) remains running after unit stopped.
[Tue Oct  6 19:24:41 2020] systemd[1]: kresd@11.service: Unit process
3741645 (kresd) remains running after unit stopped.
[Tue Oct  6 19:24:41 2020] systemd[1]: Failed to start Knot Resolver daemon.
[Tue Oct  6 19:24:41 2020] systemd[1]: kresd@12.service: Processes still
around after final SIGKILL. Entering failed mode.
[Tue Oct  6 19:24:41 2020] systemd[1]: kresd@12.service: Failed with
result 'timeout'.
[Tue Oct  6 19:24:41 2020] systemd[1]: kresd@12.service: Unit process
1335155 (kresd) remains running after unit stopped.
[Tue Oct  6 19:24:41 2020] systemd[1]: kresd@12.service: Unit process
3741646 (kresd) remains running after unit stopped.
[Tue Oct  6 19:24:41 2020] systemd[1]: Failed to start Knot Resolver daemon.
[Tue Oct  6 19:24:41 2020] systemd[1]: kresd@11.service: Scheduled
restart job, restart counter is at 2.
[Tue Oct  6 19:24:41 2020] systemd[1]: kresd@12.service: Scheduled
restart job, restart counter is at 2.
[Tue Oct  6 19:24:41 2020] systemd[1]: Stopped Knot Resolver daemon.
[Tue Oct  6 19:24:41 2020] systemd[1]: kresd@11.service: Found left-over
process 1335180 (kresd) in control group while starting unit. Ignoring.
[Tue Oct  6 19:24:41 2020] systemd[1]: This usually indicates unclean
termination of a previous run, or service implementation deficiencies.
[Tue Oct  6 19:24:41 2020] systemd[1]: kresd@11.service: Found left-over
process 3741645 (kresd) in control group while starting unit. Ignoring.
[Tue Oct  6 19:24:41 2020] systemd[1]: This usually indicates unclean
termination of a previous run, or service implementation deficiencies.
[Tue Oct  6 19:24:41 2020] systemd[1]: Starting Knot Resolver daemon...
[Tue Oct  6 19:24:41 2020] systemd[1]: Stopped Knot Resolver daemon.
[Tue Oct  6 19:24:41 2020] systemd[1]: kresd@12.service: Found left-over
process 1335155 (kresd) in control group while starting unit. Ignoring.
[Tue Oct  6 19:24:41 2020] systemd[1]: This usually indicates unclean
termination of a previous run, or service implementation deficiencies.
[Tue Oct  6 19:24:41 2020] systemd[1]: kresd@12.service: Found left-over
process 3741646 (kresd) in control group while starting unit. Ignoring.
[Tue Oct  6 19:24:41 2020] systemd[1]: This usually indicates unclean
termination of a previous run, or service implementation deficiencies.
[Tue Oct  6 19:24:41 2020] systemd[1]: Starting Knot Resolver daemon...
[Tue Oct  6 19:25:26 2020] systemd[1]: kresd@13.service: start operation
timed out. Terminating.
[Tue Oct  6 19:25:26 2020] systemd[1]: Started Knot Resolver daemon.
[Tue Oct  6 19:25:26 2020] systemd[1]: Started Knot Resolver daemon.


All that I believe is because of swap issue, even DNS resolver watchdog
killed its processes and failed to restart them for a minute. Everything
went back to normal after that and server continue to operate without
swap, uptime is 31 days. I've tested RAM extensively with memtest few
months ago and everything was fine.

Kind regards,
Piotr


Reply to: