[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bookworm system randomly not responding (was Re: Bookworm system not responding on high memory usage)



Xiyue Deng <xiyueden@debian-hx90.lan> writes:

> Xiyue Deng <manphiz@gmail.com> writes:
>
>> So after some more tries it looks like this issue is not directly memory
>> usage related.  I've tried the following:
>>
>> * Using older kernel version when I was on Bullseye.
>> * Have a cronjob to drop memory caches every minutes.
>> * Using Gnome on Wayland by default or Xorg.
>>
>> And this can still happen when I was running a qemu-based Win11 VM using
>> virtual manager.  So this rules out the possibility of a kernel issue
>> and OOM killer issue.  All that is certain is that this issue can be
>> reproduced when running my qemu-based Win11 VM and in a few hours it
>> will trigger this lockup.
>>
>> As this system has been running Bullseye for a few years with zero
>> problem, I'm hopeful this should work for Bookworm as well.  If you have
>> anything in mind that may worth a try please feel free to share.  The
>> more ideas the better.
>>
>> Thanks in advance!
>
> So, to rule out possible software issues, I've done a clean install of
> Bookworm and Bullseye, and this issue still happens.  I guess this
> largely lowers the possibility of a software cause.  I've also done a
> 10-hour memtest session and it passed so I guess it was proven to be
> clean as well.
>
> For the next step, I'll go with the hardware aspect.  I want to thank
> for the helps, suggestions, and brainstorming from various people from
> #debian{,-next} IRC channels!  Will try to get to the bottom of this.
>

Actually after I decided to contact the customer service of my box[1],
after a few rounds of suggestions (reset CMOS, reinstall system, etc.),
they provided an update to the BIOS that supposed to Windows 10/11
freezing when accessing the fTPM module.  After flashing the new BIOS,
I've been running the system on high load for 12+ hours without issue.
Though a much longer testing period is needed to make sure the fix is
sufficient, I think this is looking very promising!  Will report back
after a week.

Hope this is useful for anyway having similar issues.

[1] https://store.minisforum.com/products/hx90

>>
>> (Replies to Timothy below inline.)
>>
>> Timothy M Butterworth <timothy.m.butterworth@gmail.com> writes:
>>
>>> On Sat, Mar 11, 2023 at 3:30 AM Xiyue Deng <manphiz@gmail.com> wrote:
>>>
>>>  Timothy M Butterworth <timothy.m.butterworth@gmail.com> writes:
>>>
>>>  > On Fri, Mar 10, 2023 at 7:57 PM Xiyue Deng <manphiz@gmail.com> wrote:
>>>  >
>>>  >  Hi,
>>>  >
>>>  >  I have an AMD64 system[1] that has been running fine on Bullseye for a
>>>  >  few years, and recently following the soft freeze on Bookworm I upgraded
>>>  >  my system to try it out, and the system has been frequently losing
>>>  >  response.  Initially I thought it was because of some issue of my
>>>  >  qemu-based Win11 virtual machine as it happens most frequently when it
>>>  >  was running and filed a bug report[2].  But then it happened again
>>>  >  without it running because some other program had slowly used up most of
>>>  >  the memory again, though not as frequently as the VM was running.
>>>  >
>>>  >  Now in retrospect, when I was using Bullseye the total memory was also
>>>  >  mostly used up most of the time, with a few hundreds of megabytes
>>>  >  reported as free and a few Gigs reported as cache, and it has been
>>>  >  running fine.  I'm not sure what has changed in Bookworm and having to
>>>  >  manually restart the machine is a pretty annoying and unpleasant
>>>  >  experience.
>>>  >
>>>  >  Does anyone seeing a similar problem as well?  What can I do to avoid
>>>  >  this?  Any suggest is welcome.
>>>  >
>>>  >  Thanks in advance.
>>>  >
>>>  > Open the command prompt and run `su` to switch user to root. Then run `sync && echo 1 > /proc/sys/vm/drop_caches`
>>>  as
>>>  > root. This will write RAM caches to the hard drive to free up memory. You have to run this as root as sudo, my
>>>  preferred
>>>  > method, returns a permission disabled error.
>>>
>>>  Thanks for the tip!  I'll try it out.
>>
>> So unfortunately this doesn't help either, as it happens again with very
>> low cache usage.
>>
>> `free -h`:
>>
>>                total        used        free      shared  buff/cache   available
>> Mem:            30Gi        13Gi        16Gi       206Mi       1.4Gi        17Gi
>> Swap:          979Mi          0B       979Mi
>>
>> `top` excerpt:
>>
>> top - 14:55:05 up 18 min, 11 users,  load average: 1.77, 1.65, 1.09
>> Tasks: 504 total,   1 running, 503 sleeping,   0 stopped,   0 zombie
>> %Cpu(s): 12.5 us,  0.0 sy,  0.0 ni, 68.8 id,  0.0 wa,  0.0 hi,  6.2 si,  0.0 st 
>> MiB Mem :  31519.9 total,  16972.6 free,  13759.0 used,   1447.6 buff/cache     
>> MiB Swap:    980.0 total,    980.0 free,      0.0 used.  17760.8 avail Mem 
>>
>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>>    8886 libvirt+  20   0   11.1g   8.1g  26580 S  87.5  26.4  17:38.47 qemu-sy+
>>    5434 xiyueden  20   0 4047004   1.2g 170036 S   0.0   4.0   0:41.00 thunder+
>>    5143 xiyueden  20   0 7056664 526296 191152 S   0.0   1.6   2:19.65 gnome-s+
>> ...
>>
>>>
>>>  >  
>>>  >  
>>>  >  [1] System info from inxi:
>>>  >  CPU: 8-core AMD Ryzen 9 5900HX with Radeon Graphics (-MT MCP-)
>>>  >  speed/min/max: 1199/1200/4679 MHz Kernel: 6.1.0-5-amd64 x86_64 Up: 7m
>>>  >  Mem: 4844.4/31521.3 MiB (15.4%) Storage: 476.94 GiB (54.5% used) Procs: 535
>>>  >  Shell: Bash inxi: 3.3.25
>>>  >
>>>  > Your system has 32 GB of RAM, it should not be getting used up. Run `free -h` What desktop are you using: KDE,
>>>  GNOME,
>>>  > LXQT etc? Are you using Wayland or X11? It looks like you have a memory leak in one of your applications. Try
>>>  running
>>>  > `top` and press `m` to sort by memory utilization.
>>>
>>>  I actually have a cronjob that runs every 5 minutes and collects memory
>>>  usage.  As I mentioned, it usually happens when I use qemu (see [1] for
>>>  free and [2] for top).  At another time it happened when deluge is
>>>  leaking memory (see [3] for free [4] for top).
>>>
>>>  Interestingly as you can see, in all such cases, even though the free
>>>  amount is low, the buff/cache is still pretty large so the system is not
>>>  really overloaded.  Plus, on Bullseye such memory usage also happens all
>>>  the time and this never happened.  I was suspecting that maybe the
>>>  kernel is panicking when memory hits certain limit, but I don't see it
>>>  in kern.log or syslog.
>>>
>>>  Any suggestion to restore to Bullseye status is appreciated.  Thanks in
>>>  advance!
>>>
>>>  [1] `free -h` when using qemu:
>>>                 total        used        free      shared  buff/cache   available
>>>  Mem:            30Gi        14Gi       258Mi       216Mi        17Gi        16Gi
>>>  Swap:          979Mi        80Mi       899Mi
>>>
>>> I have an  AMD Ryzen 7 4700U with Radeon Graphics and the only time I see my RAM used up is when I am transcoding Video
>>> files.  
>>>
>>> System Idle running KDE 5.27.2, Google Chrome and Dolphin:
>>>
>>>                total        used        free      shared  buff/cache   available 
>>> Mem:            14Gi       3.8Gi       9.4Gi        91Mi       2.2Gi        11Gi
>>>
>>> System with VirtualBox running Kali Linux
>>>                total        used        free      shared  buff/cache   available 
>>> Mem:            14Gi       8.9Gi       4.2Gi       110Mi       2.3Gi       6.1Gi 
>>> Swap:           14Gi          0B        14Gi
>>
>> Thanks for sharing.  I've allocated 8GB of memory for the Win11 VM so on
>> startup it will use around 15GB of memory (~50%) from the system, and I
>> should still have more than enough free memory.  As I've mentioned in
>> the beginning of the letter, it now looks less likely a memory related
>> issue.
>>
>>>
>>>  
>>>  [2] `top` sorted by memory when using qemu:
>>>  top - 16:10:05 up  1:29, 11 users,  load average: 1.83, 1.86, 2.06
>>>  Tasks: 494 total,   1 running, 493 sleeping,   0 stopped,   0 zombie
>>>  %Cpu(s):  8.3 us,  8.3 sy,  0.0 ni, 75.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st 
>>>  MiB Mem :  31522.7 total,    257.2 free,  14430.8 used,  17504.1 buff/cache     
>>>  MiB Swap:    980.0 total,    899.5 free,     80.5 used.  17091.9 avail Mem 
>>>
>>>      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>>>    10131 libvirt+  20   0   11.2g   8.1g  26140 S 213.3  26.2  75:08.67 qemu-sy+
>>>     6547 xiyueden  20   0 4432172   1.4g 207312 S   0.0   4.5   1:53.44 thunder+
>>>  ...
>>>
>>>  [3] `free -h` when using deluge:
>>>                 total        used        free      shared  buff/cache   available
>>>  Mem:            30Gi        12Gi       1.9Gi       219Mi        17Gi        18Gi
>>>  Swap:          979Mi       2.2Mi       977Mi
>>>
>>>  [4] `top` sorted by memory when using deluge:
>>>  top - 10:40:05 up 3 days, 17:11, 11 users,  load average: 1.25, 1.22, 1.20
>>>  Tasks: 492 total,   1 running, 490 sleeping,   0 stopped,   1 zombie
>>>  %Cpu(s): 25.0 us,  0.0 sy,  0.0 ni, 75.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st 
>>>  MiB Mem :  31521.3 total,   1909.2 free,  12762.9 used,  17529.7 buff/cache     
>>>  MiB Swap:    980.0 total,    977.7 free,      2.2 used.  18758.4 avail Mem 
>>>
>>>      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>>>     7287 xiyueden  20   0 9030940   6.6g 503076 S   0.0  21.3  97:11.62 deluge-+
>>>     5271 xiyueden  20   0 4581328   1.6g 191000 S   6.7   5.2 108:23.57 thunder+
>>>  ...
>>>
>>>  >
>>>  > Tim
>>>  >
>>>  >  
>>>  >  [2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1032400
>>>  >
>>>  >  -- 
>>>  >  Manphiz
>>>
>>>  -- 
>>>  Manphiz


-- 
Manphiz


Reply to: