[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: minimizing buffer size (i.e. page cache) for bulk copy/rsync -- Re: Swappiness in Buster



On Wed, Jul 08, 2020 at 08:05:23AM -0400, Greg Wooledge wrote:
> On Wed, Jul 08, 2020 at 08:00:38AM -0400, Dan Ritter wrote:
> > Zenaan Harkness wrote: 
> > > 
> > > Seriously, is there a way to stop bulk copies from eternally flushing my
> > > $Desktop's cached pages down the drain minute after blessful minute,
> > > hour after gratitude filled hour?
> > 
> > softlimit is packaged in daemontools.
> > 
> > NAME
> >        softlimit - runs another program with new resource
> > limits.
> 
> I haven't been following this thread because I don't know anything
> about the kernel's "swappiness" settings.
> 
> But I'm a little bit confused about the intent here -- setting a
> resource limit on memory is just going to make the program crash
> with an "out of memory" error when it tries to exceed the limit.
> It's not going to discipline a memory hog program into using
> different algorithms.

I also know nothing about swappiness.

But this soft limit tool sounds exceedingly powerful (<busily rubs
hands together in anticipation of heavy use of limits to gleefully bring
misbehaving programs to screeching core dumping halts without remorse
MWUAHAHAHAHAHHAHAHHHHAHAAAAAAAAAAAAAA!!!!!>)

Hmm. Excuse me ... where was I?

Oh yes, powERRR!

What I imagine, and surely hope is that this softlimit program works as
advertised.  You see, the copy program (normally aliased as `cp`) when
copying say 900 Gigabytes, does not, or certainly should not, consume
more than say 1 Megabyte of memory as a read buffer.  Of course _some_
buffer is needed, since otherwise each read call reads only one byte,
which would result in a ridiculous number of kernel calls and the
performance that doing so entails.  But half a MiB being written to the
destination, whilst another half MiB is being read from the source
"should" be more than enough, since, roughly speaking, those buffers can
simply swap over when they're both ready for another round.

Since if it's not written in a totally insane way, cp (and to a lesser
extent rsync) should work like this already, what we really need to
limit is that pig with the strange "Linux" name (I think it's called
fake lipschtick on an even-toed ungulate family Suidae, I mean kernel).

You see Linux in its eternal and fruitless desire to make user's happy,
gladly tells cp's read side that reading is finished, and also tells its
write side that writing is finished, and so cp races as fast as it
possibly can even if, and especially if, one of the drives is a
different speed than the other (almost always the case) - and the bigger
the difference, the quicker RAM is filled up, and Firefox's tabs evicted
from memory.

What has been happening since the dawn of time is that this kernel
caches _every_ disk page read by a program such as cp (I know, I know,
this is previously unheard of and amazing information I'm leaking) into
something Linux calls the 'page cache' (totally bizzare thing to call an
in memory cache of disk pages, but hey, what do I know...).

And Linux, ever generous with other peoples' resources, hands out your
RAM, ballooning the page cache as though the world will end if it does
not do this to the greatest extent possible "because you might refer to
one of those pages in the near future" and of course you do, when you
write it to the destination disk, at which point, like a pregnant Sow on
meth, Linux almost faints in excitement as its prediction that you would
use that page again, comes true.

And so now Linux is ecstaticly hyping the utility of all these "read
once, write once" pages in the vain hope that evicting all your firefox
Tab's cached memory pages is The Right Thing To Do ™©®.  (And Firefox,
unlike cp and rsync, is helpful enough to tag most of its RAM pages with
"Totally ephemeral dude, feel free to dump to disk or even completely
destroy at any time").

Firefox is super helpful like that :(

And, cp and rsync are evidently nowhere as helpful as Firefox is to
Linux's desire to please the user, and so your desktop experience goes
straight down the drain.

This has been happening since I discovered Linux - 20 years ago, the
symptom was the entire desktop stuttering, the X cursor jumping with a
latency of 10 seconds, if you were lucky.

Alas, every one of the 4000+ kernel developers works for Google or
Amazon ECS, and throughput is the only thing the datacenter needs - that
and the ability to compile Linux kernels.


Reply to: