[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Pageout not succeeding



Hello,

Michael Kelly, le mar. 27 mai 2025 18:01:05 +0100, a ecrit:
> Anyway, here are the contextual diffs as requested:

Thanks!

> diff -u git/gnumach/kern/slab.c gnumach/kern/slab.c
> --- ./git/gnumach/kern/slab.c   2025-05-27 10:13:48.422454969 +0100
> +++ gnumach/kern/slab.c 2025-05-27 10:31:48.257755655 +0100
> @@ -366,13 +366,20 @@
>  static vm_offset_t
>  kmem_pagealloc_physmem(vm_size_t size)
>  {
> +    thread_t thread = current_thread();
>      struct vm_page *page;
> 
>      assert(size == PAGE_SIZE);
> 
>      for (;;) {
> +       if (thread)
> +           thread->vm_privilege++;
> +
>          page = vm_page_grab(VM_PAGE_DIRECTMAP);
> 
> +       if (thread)
> +           thread->vm_privilege--;
> +
>          if (page != NULL)
>              break;

We don't want to make all kmem_pagealloc_physmem calls blindly elevate
privileges, that can make a lot of harm as well, because it would allow
all processes (and notably non-root) eat all memory without caring to
leave room for the few paths which really need vm privilege.

> In all cases the underlying cause was several threads awaiting a
> memory page to allow kmem_cache_alloc() to return memory for the
> many aspects of ipc, threads, pmap and so on. One of the pager tasks
> (mach-defpager and ext2fs) would also be stuck waiting for memory for
> page in to progress.

So it's mach-defpager which need to be given vm_privilege. If that's
not already so, that is what really needs to be fixed. Normally that's
already done by mach-defpager's wire_thread() which uses thread_wire(),
but possibly a bug makes it miss some mach-defpager thread, and that's
what needs to be fixed.

Concerning ext2fs, in principle it should not need to be vm-privileged:
mach-defpager should be able to double-page ext2fs pages in case ext2fs
is not managing to free memory.

If we wanted to give ext2fs/libdiskfs vm_privilege (which is against
the "external pagers are not vm-privileged" comment), we'd need to make
sure that it's able to recognize when we are short on memory and aim
for releasing memory rather than processing yet more write requests
etc. I don't think we are there yet. But that's indeed a path we should
probably aim for, to be able to throttle processes that are trying to
eat a lot of memory through page caching.

> It seems from a number of comments throughout the mach code that there is a
> long running worry about memory being sufficiently available in order to allow
> page out to operate and release memory. Has there ever been any tentative
> proposals for how to address this issue with more certainty?

It's essentially an unsolved problem in general. Linux "solves" it
thanks to an OOM-killer. Possibly we'd want to introduce one.

Normally what we achieve in gnumach is making sure mach-defpager is
vm_privileged, so it's able to use the memory allocation margin to
achive swapping out. The margin is essentially a guess of what is really
needed, but it's probably enough. The point is making sure it's able to
use it and not give it to something else.

> diff -u git/gnumach/vm/vm_page.c gnumach/vm/vm_page.c
> --- ./git/gnumach/vm/vm_page.c  2025-05-27 10:13:48.518453877 +0100
> +++ gnumach/vm/vm_page.c        2025-05-27 14:37:19.675842175 +0100
> @@ -368,10 +368,7 @@
> 
>      assert(order < VM_PAGE_NR_FREE_LISTS);
> 
> -    if (vm_page_alloc_paused && current_thread()
> -        && !current_thread()->vm_privilege) {
> -        return NULL;
> -    } else if (seg->nr_free_pages <= seg->low_free_pages) {
> +    if (seg->nr_free_pages <= seg->low_free_pages) {
>          vm_pageout_start();
> 
>          if ((seg->nr_free_pages <= seg->min_free_pages)

> The final suggestion contained in the attached patch is to allow
> memory allocation within a memory segment that would normally be
> paused (vm_page_alloc_paused==1) but actually has nr_free_pages high
> enough to permit the allocation within that segment. In such cases the
> pausing was triggered by a different memory segment.

I don't think we want that: if we let other segments fill up as well,
then we won't have room to push data from the filled segment to
other segments.  I'll however agree if the segment has room beyond
seg->high_free_pages to accomodate the data that we want to migrate
(essentially the sum of max(seg->high_free_pages - seg->nr_free_pages,
0) for seg among the segments below)

> @@ -1136,6 +1133,7 @@
>          current_task()->reactivations++;
>          vm_page_unlock_queues();
>          page = NULL;
> +       object = NULL;
>          goto restart;
>      }
> 

Applied, thanks!

> @@ -1210,15 +1208,15 @@
>       * one unnecessarily.
>       */
> 
> -    if (double_paging && !object->pager_initialized) {
> +    if (!object->pager_initialized) {
>          vm_object_collapse(object);
>      }
> 
> -    if (double_paging && !object->pager_initialized) {
> +    if (!object->pager_initialized) {
>          vm_object_pager_create(object);
>      }
> 
> -    if (double_paging && !object->pager_initialized) {
> +    if (!object->pager_initialized) {
>          panic("vm_page_seg_evict");
>      }

So this would be my recent fd63a4bbf6f2201846f37afba348c5db88364c44

The point of the patch was to cope with the case where there is no DMM.
I indeed got the condition from for the internal objects which of course
will never have double_paging even if we have a DMM. I have changed the
condition, could you try it?

Thanks again for the investigation, this will be really useful to fix
building large packages etc.
Samuel


Reply to: