Re: swap

To: debian-user@lists.debian.org
Subject: Re: swap
From: Greg Folkert <greg@gregfolkert.net>
Date: Thu, 10 May 2007 17:12:23 -0400
Message-id: <[🔎] 1178831543.19099.70.camel@princess.gregfolkert.net>
In-reply-to: <[🔎] 20070507193937.GB7199@titan>
References: <[🔎] 406268.7784.qm@web58911.mail.re1.yahoo.com> <[🔎] 463EFE5A.9040402@cox.net> <[🔎] 1178553356.1822.34.camel@princess.gregfolkert.net> <[🔎] 20070507161531.GA9072@titan> <[🔎] 1178557643.1822.61.camel@princess.gregfolkert.net> <[🔎] 20070507193937.GB7199@titan>

On Mon, 2007-05-07 at 15:39 -0400, Douglas Allan Tutty wrote:
> On Mon, May 07, 2007 at 01:07:23PM -0400, Greg Folkert wrote:
> > On Mon, 2007-05-07 at 12:15 -0400, Douglas Allan Tutty wrote:
> > > On Mon, May 07, 2007 at 11:55:56AM -0400, Greg Folkert wrote:
> > Cluster? HA! Bigger Single computer? HA!
> > 
> > They have 8 processor machines with 64GB of memory already. The batch
> > process can only utilize 1 processor. The other 7 processors, are
> > basically idle. I've trended the entire machine for them. If they could
> > LPAR the machine(s) out, they'd be marvelously happy. But they would
> > need to get the memory upto 512MB or better and then multi-path IO for
> > the swap... sheesh. It would be cheaper to just buy another machine and
> > add it, but then they already have 3 hours at worst, 4 hours at best, of
> > growth left.
> > 
> > In any case, a "pre-batch" program assigns jobs to each machine, it
> > takes nearly an hour to estimate loads. Again single processor usage. 
> > 
> > This whole package was never meant to scale. But it has been forced to.
> > It also was meant to be a temporary fix until a new system was to be
> > spec'd and written. Nothing ever came of the effort in the 70's and was
> > dropped when this was "good enough".
> 
> I suppose the holy-grail would be something that does for CPUs in boxes
> what LVM does for disks:  Allow a single-threaded process to utilize
> multiple CPUs for more speed, those CPUs able to be both within one box,
> and spread: a CPU pool and a memory pool.

That is pretty much what IBMs LPAR of AS/400 and AIX (and other
hypervisor setups) do. Unfortunately, this company was "sold" a
sooper-dooper machine in a deal for 2 of them. They would have spent
more on smaller machines. But, IBM has sales quotas and deal deadlines
for sales people. Forklift upgrades, full cabinet deals are pretty much
the norm when it comes to the sales department. IOW, push more hardware,
period, they'll eventually be suckers for upgrades.

> The focus for a while seems to have been how to divide up a big computer
> in to several smaller virtual servers (ala xen or IBM's LPARs).  I
> haven't kept up on efforts to solve a massivly sequential problem.
> However, my interest is aroused.

Massively sequential problems are very, very, very difficult to
parallize. Even vectored processor systems balk and fail badly at
massively sequential problems.

> If you have a box with 8 processors and your process can only use one,
> can you use something like Xen, designate one whole processor and its
> memory to your main process and use the other processors as helpers?
> (maybe you don't need Xen for that, I don't know).  

Yes, you could, but that is why I mentioned they would need 512GB+ of
RAM and serious multi-pathing to the IO to get sufficient bottle neck
reduction. It would be cheaper to just add smaller more "commoditized"
systems aka "Linux" which the software vendor is still resisting. Or
even just to add 2 processor AIX systems with 64GB of memory.

> For example, if the process needs more memory and therefore uses swap,
> and the MB is maxed out for memory, could another processor be used by
> the OS to manage a multi-disk swap farm?  Put another way, if a linux
> box can serve data to saturate a gigabit ethernet, and it is possible to
> create a block device that looks like a disk that really gets its data
> over ethernet from another computer, can an 8-way MB take that input
> and present a virtual swap device to one processor so that swap
> functions at the same speed as memory?

Wow, you have three ideas in one paragraph there.

First off, let me tell you a bit more about the processing that goes on.
First off, the "primary" machine goes through and does an estimate on
the number of records pulled for each "billing job". It then assigns (in
the DB) which defined machine will do what jobs. Each of these machines
are defined with "capacity" info built into the script to determine the
"amount" of work possible. Its a huge SWAG, that is tweaked until right,
over the course of a few weeks.

This then kicks off the processing of the previous days records. Each
machine ALL of its daily info into memory, this of course creates a big
problem... the 36 and AS/400 systems did not do this, they used
piece-meal in methods. This behavior was changed during the conversion
to use AIX. The static data in memory of course swaps out. This was
causing the behavior of the machines becoming lethargic and not able to
complete the processing in DAYS, falling further and further behind.

Add enough swap to "cover" the anemic amount of working room problems
fixed the lethargy. Though a back hack, it does work. Though, adding
machines would be easier on the amount of memory required per machine to
use, remember they were "sold" expensive systems. To much to allow for
smaller machine that could actually work better.

Now, idea number one. Multi-disk swap farm, already done, I've spread
the swap out over multiple disks on the SSA channels/chains using
logical extents with a policy of striping across multiple disk with
contiguous allocations per disk. This gives near instant response,
though it is still many magnitudes slower then memory.

Idea number two. Using Gig-E with ATAOE or iSCSI, only gets you "about"
90-95MB/sec, providing you have TCP Offload Enabled(TOE) NICs on both
ends. And to really be anything close to be able to do swap as quickly
as I have it configured on the AIX machines, you'd need 8 or more bonded
TOE NICs on each end... assuming either end can transfer the data over
the "data bus". For commodity Linux machines, this is the Peered PCI
bus. For the AIX machine... I think (I'll have to verify, but I just am
using for example), nope... I ain't got it. But basically both ends have
to have the capacity to transfer the data to get the speeds. Only that
bonding with TOE NICs is not really a good idea as it defeats many of
the advantages of TOE in the first place.

And finally idea number three. Using memory as swap... It is a good
idea... but then, the whole purpose of swap was that memory was not
sufficient enough to provide enough working room. Going above 64GB of
memory on ANY machine is not cheap. Not cheap at all. If I were going to
use it as swap, I just assume use it as REAL working memory. AIX really
doesn't have a set-in-stone maximum amount of memory it can support, so
to use REALLY expensive memory as swap, I'd fire myself for doing that.
Nice idea on paper, but in reality... not viable.

> I guess that's called a mainframe :)

No, mainframes are not really that "capable" as a holy grail to set your
sights on. Yes they operate on a different set of standards, but over
all, they were designed to handle large amounts of input and output in a
very reasonable way. They really don't do computing any better or worse
(subjective, yes, I know) than any other systems. The only real thing
mainframes do better than many other systems, that I know of, is COST a
lot of money to maintain and upgrade. 

> Greg, I'm just babling on this.  If you have links for reading I could
> do, I'd appreciate it.  Then I may at least know what I'm babbling
> about.

IBM has a lot of goodies on publib

http://publib.boulder.ibm.com/eserver/

Have fun reading *FOREVER*,
I mean that literally. And if you have enough time you can also
checkout:

http://www.elink.ibmlink.ibm.com/publications/servlet/pbi.wss

For even more IBM publications.
-- 
greg, greg@gregfolkert.net
PGP key: 1024D/B524687C  2003-08-05
Fingerprint: E1D3 E3D7 5850 957E FED0  2B3A ED66 6971 B524 687C
Alternate Fingerprint: 09F9 1102 9D74  E35B D841 56C5 6356 88C0

Attachment: signature.asc
Description: This is a digitally signed message part

Reply to:

Follow-Ups:
- Re: swap
  - From: Ron Johnson <ron.l.johnson@cox.net>
- Re: swap
  - From: Douglas Allan Tutty <dtutty@porchlight.ca>

References:
- swap
  - From: Francesco Pietra <chiendarret@yahoo.com>
- Re: swap
  - From: Ron Johnson <ron.l.johnson@cox.net>
- Re: swap
  - From: Greg Folkert <greg@gregfolkert.net>
- Re: swap
  - From: Douglas Allan Tutty <dtutty@porchlight.ca>
- Re: swap
  - From: Greg Folkert <greg@gregfolkert.net>
- Re: swap
  - From: Douglas Allan Tutty <dtutty@porchlight.ca>

Prev by Date: Re: Are volume labels a file-system thing?
Next by Date: Re: [OT] Good, evil and religion [WAS] Re: A way to compile 3rd party modules into deb system?
Previous by thread: Re: swap
Next by thread: Re: swap
Index(es):
- Date
- Thread