Re: Bookworm system randomly not responding (was Re: Bookworm system not responding on high memory usage)
Xiyue Deng <xiyueden@debian-hx90.lan> writes:
> Xiyue Deng <manphiz@gmail.com> writes:
>
>> So after some more tries it looks like this issue is not directly memory
>> usage related. I've tried the following:
>>
>> * Using older kernel version when I was on Bullseye.
>> * Have a cronjob to drop memory caches every minutes.
>> * Using Gnome on Wayland by default or Xorg.
>>
>> And this can still happen when I was running a qemu-based Win11 VM using
>> virtual manager. So this rules out the possibility of a kernel issue
>> and OOM killer issue. All that is certain is that this issue can be
>> reproduced when running my qemu-based Win11 VM and in a few hours it
>> will trigger this lockup.
>>
>> As this system has been running Bullseye for a few years with zero
>> problem, I'm hopeful this should work for Bookworm as well. If you have
>> anything in mind that may worth a try please feel free to share. The
>> more ideas the better.
>>
>> Thanks in advance!
>
> So, to rule out possible software issues, I've done a clean install of
> Bookworm and Bullseye, and this issue still happens. I guess this
> largely lowers the possibility of a software cause. I've also done a
> 10-hour memtest session and it passed so I guess it was proven to be
> clean as well.
>
> For the next step, I'll go with the hardware aspect. I want to thank
> for the helps, suggestions, and brainstorming from various people from
> #debian{,-next} IRC channels! Will try to get to the bottom of this.
>
Actually after I decided to contact the customer service of my box[1],
after a few rounds of suggestions (reset CMOS, reinstall system, etc.),
they provided an update to the BIOS that supposed to Windows 10/11
freezing when accessing the fTPM module. After flashing the new BIOS,
I've been running the system on high load for 12+ hours without issue.
Though a much longer testing period is needed to make sure the fix is
sufficient, I think this is looking very promising! Will report back
after a week.
Hope this is useful for anyway having similar issues.
[1] https://store.minisforum.com/products/hx90
>>
>> (Replies to Timothy below inline.)
>>
>> Timothy M Butterworth <timothy.m.butterworth@gmail.com> writes:
>>
>>> On Sat, Mar 11, 2023 at 3:30 AM Xiyue Deng <manphiz@gmail.com> wrote:
>>>
>>> Timothy M Butterworth <timothy.m.butterworth@gmail.com> writes:
>>>
>>> > On Fri, Mar 10, 2023 at 7:57 PM Xiyue Deng <manphiz@gmail.com> wrote:
>>> >
>>> > Hi,
>>> >
>>> > I have an AMD64 system[1] that has been running fine on Bullseye for a
>>> > few years, and recently following the soft freeze on Bookworm I upgraded
>>> > my system to try it out, and the system has been frequently losing
>>> > response. Initially I thought it was because of some issue of my
>>> > qemu-based Win11 virtual machine as it happens most frequently when it
>>> > was running and filed a bug report[2]. But then it happened again
>>> > without it running because some other program had slowly used up most of
>>> > the memory again, though not as frequently as the VM was running.
>>> >
>>> > Now in retrospect, when I was using Bullseye the total memory was also
>>> > mostly used up most of the time, with a few hundreds of megabytes
>>> > reported as free and a few Gigs reported as cache, and it has been
>>> > running fine. I'm not sure what has changed in Bookworm and having to
>>> > manually restart the machine is a pretty annoying and unpleasant
>>> > experience.
>>> >
>>> > Does anyone seeing a similar problem as well? What can I do to avoid
>>> > this? Any suggest is welcome.
>>> >
>>> > Thanks in advance.
>>> >
>>> > Open the command prompt and run `su` to switch user to root. Then run `sync && echo 1 > /proc/sys/vm/drop_caches`
>>> as
>>> > root. This will write RAM caches to the hard drive to free up memory. You have to run this as root as sudo, my
>>> preferred
>>> > method, returns a permission disabled error.
>>>
>>> Thanks for the tip! I'll try it out.
>>
>> So unfortunately this doesn't help either, as it happens again with very
>> low cache usage.
>>
>> `free -h`:
>>
>> total used free shared buff/cache available
>> Mem: 30Gi 13Gi 16Gi 206Mi 1.4Gi 17Gi
>> Swap: 979Mi 0B 979Mi
>>
>> `top` excerpt:
>>
>> top - 14:55:05 up 18 min, 11 users, load average: 1.77, 1.65, 1.09
>> Tasks: 504 total, 1 running, 503 sleeping, 0 stopped, 0 zombie
>> %Cpu(s): 12.5 us, 0.0 sy, 0.0 ni, 68.8 id, 0.0 wa, 0.0 hi, 6.2 si, 0.0 st
>> MiB Mem : 31519.9 total, 16972.6 free, 13759.0 used, 1447.6 buff/cache
>> MiB Swap: 980.0 total, 980.0 free, 0.0 used. 17760.8 avail Mem
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 8886 libvirt+ 20 0 11.1g 8.1g 26580 S 87.5 26.4 17:38.47 qemu-sy+
>> 5434 xiyueden 20 0 4047004 1.2g 170036 S 0.0 4.0 0:41.00 thunder+
>> 5143 xiyueden 20 0 7056664 526296 191152 S 0.0 1.6 2:19.65 gnome-s+
>> ...
>>
>>>
>>> >
>>> >
>>> > [1] System info from inxi:
>>> > CPU: 8-core AMD Ryzen 9 5900HX with Radeon Graphics (-MT MCP-)
>>> > speed/min/max: 1199/1200/4679 MHz Kernel: 6.1.0-5-amd64 x86_64 Up: 7m
>>> > Mem: 4844.4/31521.3 MiB (15.4%) Storage: 476.94 GiB (54.5% used) Procs: 535
>>> > Shell: Bash inxi: 3.3.25
>>> >
>>> > Your system has 32 GB of RAM, it should not be getting used up. Run `free -h` What desktop are you using: KDE,
>>> GNOME,
>>> > LXQT etc? Are you using Wayland or X11? It looks like you have a memory leak in one of your applications. Try
>>> running
>>> > `top` and press `m` to sort by memory utilization.
>>>
>>> I actually have a cronjob that runs every 5 minutes and collects memory
>>> usage. As I mentioned, it usually happens when I use qemu (see [1] for
>>> free and [2] for top). At another time it happened when deluge is
>>> leaking memory (see [3] for free [4] for top).
>>>
>>> Interestingly as you can see, in all such cases, even though the free
>>> amount is low, the buff/cache is still pretty large so the system is not
>>> really overloaded. Plus, on Bullseye such memory usage also happens all
>>> the time and this never happened. I was suspecting that maybe the
>>> kernel is panicking when memory hits certain limit, but I don't see it
>>> in kern.log or syslog.
>>>
>>> Any suggestion to restore to Bullseye status is appreciated. Thanks in
>>> advance!
>>>
>>> [1] `free -h` when using qemu:
>>> total used free shared buff/cache available
>>> Mem: 30Gi 14Gi 258Mi 216Mi 17Gi 16Gi
>>> Swap: 979Mi 80Mi 899Mi
>>>
>>> I have an AMD Ryzen 7 4700U with Radeon Graphics and the only time I see my RAM used up is when I am transcoding Video
>>> files.
>>>
>>> System Idle running KDE 5.27.2, Google Chrome and Dolphin:
>>>
>>> total used free shared buff/cache available
>>> Mem: 14Gi 3.8Gi 9.4Gi 91Mi 2.2Gi 11Gi
>>>
>>> System with VirtualBox running Kali Linux
>>> total used free shared buff/cache available
>>> Mem: 14Gi 8.9Gi 4.2Gi 110Mi 2.3Gi 6.1Gi
>>> Swap: 14Gi 0B 14Gi
>>
>> Thanks for sharing. I've allocated 8GB of memory for the Win11 VM so on
>> startup it will use around 15GB of memory (~50%) from the system, and I
>> should still have more than enough free memory. As I've mentioned in
>> the beginning of the letter, it now looks less likely a memory related
>> issue.
>>
>>>
>>>
>>> [2] `top` sorted by memory when using qemu:
>>> top - 16:10:05 up 1:29, 11 users, load average: 1.83, 1.86, 2.06
>>> Tasks: 494 total, 1 running, 493 sleeping, 0 stopped, 0 zombie
>>> %Cpu(s): 8.3 us, 8.3 sy, 0.0 ni, 75.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
>>> MiB Mem : 31522.7 total, 257.2 free, 14430.8 used, 17504.1 buff/cache
>>> MiB Swap: 980.0 total, 899.5 free, 80.5 used. 17091.9 avail Mem
>>>
>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>> 10131 libvirt+ 20 0 11.2g 8.1g 26140 S 213.3 26.2 75:08.67 qemu-sy+
>>> 6547 xiyueden 20 0 4432172 1.4g 207312 S 0.0 4.5 1:53.44 thunder+
>>> ...
>>>
>>> [3] `free -h` when using deluge:
>>> total used free shared buff/cache available
>>> Mem: 30Gi 12Gi 1.9Gi 219Mi 17Gi 18Gi
>>> Swap: 979Mi 2.2Mi 977Mi
>>>
>>> [4] `top` sorted by memory when using deluge:
>>> top - 10:40:05 up 3 days, 17:11, 11 users, load average: 1.25, 1.22, 1.20
>>> Tasks: 492 total, 1 running, 490 sleeping, 0 stopped, 1 zombie
>>> %Cpu(s): 25.0 us, 0.0 sy, 0.0 ni, 75.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
>>> MiB Mem : 31521.3 total, 1909.2 free, 12762.9 used, 17529.7 buff/cache
>>> MiB Swap: 980.0 total, 977.7 free, 2.2 used. 18758.4 avail Mem
>>>
>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>> 7287 xiyueden 20 0 9030940 6.6g 503076 S 0.0 21.3 97:11.62 deluge-+
>>> 5271 xiyueden 20 0 4581328 1.6g 191000 S 6.7 5.2 108:23.57 thunder+
>>> ...
>>>
>>> >
>>> > Tim
>>> >
>>> >
>>> > [2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1032400
>>> >
>>> > --
>>> > Manphiz
>>>
>>> --
>>> Manphiz
--
Manphiz
Reply to: