Re: Bookworm system randomly not responding (was Re: Bookworm system not responding on high memory usage)
- To: Xiyue Deng <xiyueden@debian-hx90.lan>
- Cc: Debian Users <debian-user@lists.debian.org>
- Subject: Re: Bookworm system randomly not responding (was Re: Bookworm system not responding on high memory usage)
- From: Xiyue Deng <manphiz@gmail.com>
- Date: Mon, 10 Apr 2023 04:00:51 -0700
- Message-id: <[🔎] 87bkjwuhe6.fsf@debian-hx90.lan>
- In-reply-to: <87355ldi8h.fsf@debian-hx90.lan>
- References: <874jqsunon.fsf@debian-hx90> <CAO6YxPxOWyv7AXT8E7twBqW4vS=EgNN_vFj=P_DBCmFuyLb7ZQ@mail.gmail.com> <87zg8ju1um.fsf@debian-hx90> <CAO6YxPxWopEv_m1YOWNrj0zzn2v_n=sw8NppYkfHxCmaOe0nNw@mail.gmail.com> <87y1o1mqlu.fsf@debian-hx90> <f63eafa27520916560ada31dd3f7b44545a5abfd7dc321952f698f9d59781e05@mu.id> <87355ldi8h.fsf@debian-hx90.lan>
Xiyue Deng <manphiz@gmail.com> writes:
> Xiyue Deng <xiyueden@debian-hx90.lan> writes:
>
>> Xiyue Deng <manphiz@gmail.com> writes:
>>
>>> So after some more tries it looks like this issue is not directly memory
>>> usage related. I've tried the following:
>>>
>>> * Using older kernel version when I was on Bullseye.
>>> * Have a cronjob to drop memory caches every minutes.
>>> * Using Gnome on Wayland by default or Xorg.
>>>
>>> And this can still happen when I was running a qemu-based Win11 VM using
>>> virtual manager. So this rules out the possibility of a kernel issue
>>> and OOM killer issue. All that is certain is that this issue can be
>>> reproduced when running my qemu-based Win11 VM and in a few hours it
>>> will trigger this lockup.
>>>
>>> As this system has been running Bullseye for a few years with zero
>>> problem, I'm hopeful this should work for Bookworm as well. If you have
>>> anything in mind that may worth a try please feel free to share. The
>>> more ideas the better.
>>>
>>> Thanks in advance!
>>
>> So, to rule out possible software issues, I've done a clean install of
>> Bookworm and Bullseye, and this issue still happens. I guess this
>> largely lowers the possibility of a software cause. I've also done a
>> 10-hour memtest session and it passed so I guess it was proven to be
>> clean as well.
>>
>> For the next step, I'll go with the hardware aspect. I want to thank
>> for the helps, suggestions, and brainstorming from various people from
>> #debian{,-next} IRC channels! Will try to get to the bottom of this.
>>
>
> Actually after I decided to contact the customer service of my box[1],
> after a few rounds of suggestions (reset CMOS, reinstall system, etc.),
> they provided an update to the BIOS that supposed to Windows 10/11
> freezing when accessing the fTPM module. After flashing the new BIOS,
> I've been running the system on high load for 12+ hours without issue.
> Though a much longer testing period is needed to make sure the fix is
> sufficient, I think this is looking very promising! Will report back
> after a week.
>
> Hope this is useful for anyway having similar issues.
It has been over a week after applying the BIOS update to my Minisforum
Elitemini HX90[1] and except a manual reboot my system has been running
totally fine! So I'd consider this issue as resolved. In case you are
using similar system from the same vendor and experiencing similar
system freezing issues, please contact the customer support for a
similar BIOS updates.
I'd like to thank the wonderful people at #debian{,-next} on IRC again
for helping me and the suggestions during the debugging!
>
> [1] https://store.minisforum.com/products/hx90
>
>>>
>>> (Replies to Timothy below inline.)
>>>
>>> Timothy M Butterworth <timothy.m.butterworth@gmail.com> writes:
>>>
>>>> On Sat, Mar 11, 2023 at 3:30 AM Xiyue Deng <manphiz@gmail.com> wrote:
>>>>
>>>> Timothy M Butterworth <timothy.m.butterworth@gmail.com> writes:
>>>>
>>>> > On Fri, Mar 10, 2023 at 7:57 PM Xiyue Deng <manphiz@gmail.com> wrote:
>>>> >
>>>> > Hi,
>>>> >
>>>> > I have an AMD64 system[1] that has been running fine on Bullseye for a
>>>> > few years, and recently following the soft freeze on Bookworm I upgraded
>>>> > my system to try it out, and the system has been frequently losing
>>>> > response. Initially I thought it was because of some issue of my
>>>> > qemu-based Win11 virtual machine as it happens most frequently when it
>>>> > was running and filed a bug report[2]. But then it happened again
>>>> > without it running because some other program had slowly used up most of
>>>> > the memory again, though not as frequently as the VM was running.
>>>> >
>>>> > Now in retrospect, when I was using Bullseye the total memory was also
>>>> > mostly used up most of the time, with a few hundreds of megabytes
>>>> > reported as free and a few Gigs reported as cache, and it has been
>>>> > running fine. I'm not sure what has changed in Bookworm and having to
>>>> > manually restart the machine is a pretty annoying and unpleasant
>>>> > experience.
>>>> >
>>>> > Does anyone seeing a similar problem as well? What can I do to avoid
>>>> > this? Any suggest is welcome.
>>>> >
>>>> > Thanks in advance.
>>>> >
>>>> > Open the command prompt and run `su` to switch user to root. Then run `sync && echo 1 > /proc/sys/vm/drop_caches`
>>>> as
>>>> > root. This will write RAM caches to the hard drive to free up memory. You have to run this as root as sudo, my
>>>> preferred
>>>> > method, returns a permission disabled error.
>>>>
>>>> Thanks for the tip! I'll try it out.
>>>
>>> So unfortunately this doesn't help either, as it happens again with very
>>> low cache usage.
>>>
>>> `free -h`:
>>>
>>> total used free shared buff/cache available
>>> Mem: 30Gi 13Gi 16Gi 206Mi 1.4Gi 17Gi
>>> Swap: 979Mi 0B 979Mi
>>>
>>> `top` excerpt:
>>>
>>> top - 14:55:05 up 18 min, 11 users, load average: 1.77, 1.65, 1.09
>>> Tasks: 504 total, 1 running, 503 sleeping, 0 stopped, 0 zombie
>>> %Cpu(s): 12.5 us, 0.0 sy, 0.0 ni, 68.8 id, 0.0 wa, 0.0 hi, 6.2 si, 0.0 st
>>> MiB Mem : 31519.9 total, 16972.6 free, 13759.0 used, 1447.6 buff/cache
>>> MiB Swap: 980.0 total, 980.0 free, 0.0 used. 17760.8 avail Mem
>>>
>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>> 8886 libvirt+ 20 0 11.1g 8.1g 26580 S 87.5 26.4 17:38.47 qemu-sy+
>>> 5434 xiyueden 20 0 4047004 1.2g 170036 S 0.0 4.0 0:41.00 thunder+
>>> 5143 xiyueden 20 0 7056664 526296 191152 S 0.0 1.6 2:19.65 gnome-s+
>>> ...
>>>
>>>>
>>>> >
>>>> >
>>>> > [1] System info from inxi:
>>>> > CPU: 8-core AMD Ryzen 9 5900HX with Radeon Graphics (-MT MCP-)
>>>> > speed/min/max: 1199/1200/4679 MHz Kernel: 6.1.0-5-amd64 x86_64 Up: 7m
>>>> > Mem: 4844.4/31521.3 MiB (15.4%) Storage: 476.94 GiB (54.5% used) Procs: 535
>>>> > Shell: Bash inxi: 3.3.25
>>>> >
>>>> > Your system has 32 GB of RAM, it should not be getting used up. Run `free -h` What desktop are you using: KDE,
>>>> GNOME,
>>>> > LXQT etc? Are you using Wayland or X11? It looks like you have a memory leak in one of your applications. Try
>>>> running
>>>> > `top` and press `m` to sort by memory utilization.
>>>>
>>>> I actually have a cronjob that runs every 5 minutes and collects memory
>>>> usage. As I mentioned, it usually happens when I use qemu (see [1] for
>>>> free and [2] for top). At another time it happened when deluge is
>>>> leaking memory (see [3] for free [4] for top).
>>>>
>>>> Interestingly as you can see, in all such cases, even though the free
>>>> amount is low, the buff/cache is still pretty large so the system is not
>>>> really overloaded. Plus, on Bullseye such memory usage also happens all
>>>> the time and this never happened. I was suspecting that maybe the
>>>> kernel is panicking when memory hits certain limit, but I don't see it
>>>> in kern.log or syslog.
>>>>
>>>> Any suggestion to restore to Bullseye status is appreciated. Thanks in
>>>> advance!
>>>>
>>>> [1] `free -h` when using qemu:
>>>> total used free shared buff/cache available
>>>> Mem: 30Gi 14Gi 258Mi 216Mi 17Gi 16Gi
>>>> Swap: 979Mi 80Mi 899Mi
>>>>
>>>> I have an AMD Ryzen 7 4700U with Radeon Graphics and the only time I see my RAM used up is when I am transcoding Video
>>>> files.
>>>>
>>>> System Idle running KDE 5.27.2, Google Chrome and Dolphin:
>>>>
>>>> total used free shared buff/cache available
>>>> Mem: 14Gi 3.8Gi 9.4Gi 91Mi 2.2Gi 11Gi
>>>>
>>>> System with VirtualBox running Kali Linux
>>>> total used free shared buff/cache available
>>>> Mem: 14Gi 8.9Gi 4.2Gi 110Mi 2.3Gi 6.1Gi
>>>> Swap: 14Gi 0B 14Gi
>>>
>>> Thanks for sharing. I've allocated 8GB of memory for the Win11 VM so on
>>> startup it will use around 15GB of memory (~50%) from the system, and I
>>> should still have more than enough free memory. As I've mentioned in
>>> the beginning of the letter, it now looks less likely a memory related
>>> issue.
>>>
>>>>
>>>>
>>>> [2] `top` sorted by memory when using qemu:
>>>> top - 16:10:05 up 1:29, 11 users, load average: 1.83, 1.86, 2.06
>>>> Tasks: 494 total, 1 running, 493 sleeping, 0 stopped, 0 zombie
>>>> %Cpu(s): 8.3 us, 8.3 sy, 0.0 ni, 75.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
>>>> MiB Mem : 31522.7 total, 257.2 free, 14430.8 used, 17504.1 buff/cache
>>>> MiB Swap: 980.0 total, 899.5 free, 80.5 used. 17091.9 avail Mem
>>>>
>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>>> 10131 libvirt+ 20 0 11.2g 8.1g 26140 S 213.3 26.2 75:08.67 qemu-sy+
>>>> 6547 xiyueden 20 0 4432172 1.4g 207312 S 0.0 4.5 1:53.44 thunder+
>>>> ...
>>>>
>>>> [3] `free -h` when using deluge:
>>>> total used free shared buff/cache available
>>>> Mem: 30Gi 12Gi 1.9Gi 219Mi 17Gi 18Gi
>>>> Swap: 979Mi 2.2Mi 977Mi
>>>>
>>>> [4] `top` sorted by memory when using deluge:
>>>> top - 10:40:05 up 3 days, 17:11, 11 users, load average: 1.25, 1.22, 1.20
>>>> Tasks: 492 total, 1 running, 490 sleeping, 0 stopped, 1 zombie
>>>> %Cpu(s): 25.0 us, 0.0 sy, 0.0 ni, 75.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
>>>> MiB Mem : 31521.3 total, 1909.2 free, 12762.9 used, 17529.7 buff/cache
>>>> MiB Swap: 980.0 total, 977.7 free, 2.2 used. 18758.4 avail Mem
>>>>
>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>>> 7287 xiyueden 20 0 9030940 6.6g 503076 S 0.0 21.3 97:11.62 deluge-+
>>>> 5271 xiyueden 20 0 4581328 1.6g 191000 S 6.7 5.2 108:23.57 thunder+
>>>> ...
>>>>
>>>> >
>>>> > Tim
>>>> >
>>>> >
>>>> > [2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1032400
>>>> >
>>>> > --
>>>> > Manphiz
>>>>
>>>> --
>>>> Manphiz
--
Manphiz
Reply to: