[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Why writing files eats up memory: Part II: The Solution



Oops, some of you saw Part I with the Part II header; no matter.
Here's Part II:

We need to make it so that dd does not eat up all the memory when it
fills the hosage object faster than the filesystem can page it out.
Two things are happening here to produce suboptimal behavior, both on
systems with enough memory that no paging even happens in this case,
and on systems with not enough.

***

First problem:

As memory gets tight, eventually the kernel forces dd to stop
allocating pages.  But it does so way too late.  First off, we can
afford to increase the free memory pool a lot more than we do now.
The 15 page threshhold at which dd gets clamped was set in 1985 (or
maybe earlier), and our machines have rather more memory than then.

But even then, the real problem is that dd and the filesystem both get
clamped *at the same time*.  What we need is for dd to get clamped
*before* the filesystem, so that dd's page allocations block, and the
filesystem's continue.  In fact, that's what happens with the default
pager: below 15 pages, the users of the default pager can't allocate
pages, but it still can.  That is set up through the special
thread_wire kernel call.

Now, we cannot thread_wire the filesystem's threads, attractive though
that might be.  But there is a solution that I think is reasonably
good.  When an external (==non-default) pager requires memory in order
to execute a pageout, I propose the assumption that it is either
allocating anonymous (==default-paged) memory in reasonably small
amounts in order to write the pages, or it is storing the pages
themselves in some *other* externally paged object.  (The case that I
am assuming is sufficiently rare as to be ignored, is the case where
the pager allocates memory in external objects, in order to page the
pages to somewhere outside the paging system.)  Given that assumption,
I propose that the following solutions will help alleviate the First
Problem.  These solutions are not mutually exclusive.  I would like to
see each get implemented.

Solution 1.1: Add a new paging threshhold (above the current 15 pages)
below which page allocations for external objects block, but
allocations for internal objects succeed.  This will keep dd or ld
from continuing to allocate pages, but allow the filesystem to make
forward progress.

Solution 1.2: We should limit the amount of memory the filesystem
consumes in processing pageout requests.  The kernel already contains
mechanisms for limiting how much paging goes to the filesystem at
once, with a dynamically tuned bursting control, which helps a lot.
But it could be improved by trying to gather adjacent pages for
pageout and issuing multi-page pageout requests.  (Right now this is
never done and libpager knows it is never done.)  This is a fair
amount of work to implement.

Solution 1.3: We could clamp the amount of memory the filesystem
spends on paging through various mechanisms.  I would like to avoid
having to do this; the existing burstiness controls manage it OK.
But those threads do consume resources, and when they are idle, the
resources are being wasted.  The fix here is twofold: make the thread
timeout code in hurd/libports/manage-multithread.c work, and make
cthreads really delete thread stacks and such when threads terminate.

Solution 1.4: The existing paging threshhold and timers were set in
1985, and they need to be increased to meet modern machine
characteristics more closely.  

***

Second problem: 

There is observed filesystem burstiness, and the reason is as follows:  

The filesystem writes to disk when one of two things happens:  either
memory is tight, and the pageout thread is working, or something has
synced the file.

Sync happens every thirty seconds (by default) and so we get slightly
pessimal behavior by waiting for all writes to happen at sync-time.
(Sync also happens when you close a file, or when various other things
trigger it, but the moral of the story is the same.)

This produces burstiness.  We fix it thus:

Solution 2: The kernel already notices sequential access, and when it
sees it it takes the "previous" pages and marks them inactive right
away.  That goes a long way; it means that when memory gets tight,
these will be the first pages cleaned (==written to disk).  But why
wait that long--the pages have to get written eventually *anyway*.  So
my solution here is to assume that external objects have "important"
data, which is long-lived, and so when it's inactive we should spend a
little effort continually trying to clean it.  My proposal is to
create a new paging threshhold as a fraction of the number of inactive
externally-managed pages, and have the pageout thread try to keep at
least that many pages clean all the time, even if memory is plentiful.


I can implement numbers 1.1, 1.4, and 2 right away.  What I would like
is concrete advice, backed up by reasoning, for how to set or adjust
these paging parameters.  The existing ones are specified at the front
of mach/vm/vm_pageout.c.

I have added 1.2 and 1.3 to the task list.


Thomas


Reply to: