[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Glibc-based Debian GNU/KNetBSD

On Fri, Dec 12, 2003 at 04:41:39PM +0200, Momchil Velikov wrote:
> >>>>> "Jimmy" == Jimmy Kaplowitz <jimmy@debian.org> writes:
> Jimmy> On Fri, Dec 12, 2003 at 02:40:05PM +0200, Momchil Velikov wrote:
> >> >>>>> "Jimmy" == Jimmy Kaplowitz <jimmy@debian.org> writes:
> >> 
> Jimmy> I can't find the exact messages for some of these examples of their
> Jimmy> experience, but one post mentioned that the poster had implemented
> Jimmy> applications using hundreds or thousands of threads; 
> >> 
> >> How can this be considered anything else than an evidence of mental
> >> illness ? (I purposedly avoid attributing it to malice).
> >> 
> >> Or was this simply a pointless "benchmark" ?
> Jimmy> What I meant by mentioning it was that this poster actually
> Jimmy> seemed to have a legitimately useful (to him) application that
> Jimmy> legitimately needed lots of threads, and getting it to work well
> Jimmy> on his development or test system must have required a fair amount
> Jimmy> of familiarity with threads and/or his OS's implementation of
> Jimmy> threads.
>   And what I meant is that anyone writing an application with hundreds
> or thousands of threads should either choose another field or seek
> professional help.  (Well, he/she might be the first one to make the
> breakthrough of finding such legitimately useful application, but I'd
> rather take the risk of wrongfully accusing him/her in incompetence).
>   That's, of course, wrt. to the usefulness of the scheduler
> activations idea or, for that matter, of any N:M (N != 1 && M != 1)
> threading architecture.
> Jimmy> I believe the poster was
> Jimmy> offering it to Robert as a way to test his eventual port of a threads
> Jimmy> library to glibc-on-BSD to see if it performs well and is thread-safe
> Jimmy> for thread-intensive applications such as his. (To give you an idea of
> Jimmy> this poster's standards, he stated that he considered all versions of
> Jimmy> Linux prior to the existence of NPTL not to be thread-safe for his
> Jimmy> purposes.)
> What's the point in demonstrating how fast can you do nothing ?

There are a few cross-pollinated conceptions here. Since I believe I'm the
person involved in all of them, let me attempt to clarify.

1) The hundred-thousand-plus thread benchmark was done as one of the first
proofs of the NPTL setup. This was significant because *any* POSIX threading
implementation worthy of the name should be able to accomplish this; the
whole point of a thread is as a lightweight process, meaning that the
creation of them should not be a heavy operation (certainly not significantly
heavier than the creation of a normal thread, in the worst case). Since
Linux prior to NPTL starts to choke and die around 30-50 threads, and is
usually completely dead in the water well before 100, being able to create
and destroy over a hundred thousand threads within a few seconds, without
leaking memory or spinlocking into oblivion is a fairly dramatic proof of
a fundamental capability.

2) I co-author software that is intended to run with several *hundred*
threads running at once. The particular example in question is a network
server, which spawns one thread per connection (connections regularly last
several hours, and have been known to last months, exchanging thousands
of request/response pairs during that timeframe), plus a dozen database
threads, various housekeeping threads, bytecode compilers and executors,

It's written this way because it makes the architecture of the program
extremely simple and straightforward, and because it models the actual
computing needs of the program very well - only a handful of threads are
ever likely to be waiting to be runnable during a particular timeslice,
the rest are in disk-wait or network-wait states, but keeping track of
everything with it's own thread means it uses the OS scheduler (which is
assumed to have had people working on it far more extensively than anything
we could write to do internal scheduling).

3) I've written scheduling algorithms and implementations in C++ for a toy
OS. Nothing terribly fancy, nor did I last in my CS degree long enough to
take a formal course on it, but it does give me some general idea of just
how complex things get - especially if you start doing things like, oh,
adding HyperThreading processors-which-aren't-really-processors to the mix.

4) I offered to modify one of my existing, data-intensive applications
to run a real, serious load test on a machine for anyone who wanted to
brutallize their thread librarys. Trying to cope with a hundred thousand
*active* threads, with memory contention and other critical sections, at a
single time is probably enough to bring just about any PC to it's knees,
whatever OS it runs. But it would certainly prove whether the threading
implementation is capable of handling it meaningfully. (This application
normally runs with approximately 2x to 3x threads, normally, where 'x'
is the number of physical CPUs available, so that even with contention
there should always be at least 1 thread runnable for each CPU, without
completely overloading the system).

The application in question is intended to generate datasets that could,
in theory, range in size up into the terabyte range, possibly the petabyte
range. Granted, I really don't expect anyone to ever care enough to give it
the hardware capacity to actually do so (though given Moore's law, and the
similar effect on disk capacity, I suppose I might be doing it myself in
another decade or two). Being able to run a hundred thousand threads would,
actually, have use if you had enough hardware to run it meaningfully.
(Though far more useful, by that stage, would be some way to measure any
period of time when nothing was runnable, and fire off another handful of
threads to fill the void; adaptive multiprocessing).

Threads are not scary. Threads are, on OSes with good threading
implementations, ways of dealing very effectively with the problems of
symmetric semi-dependant processing, by offloading the task of figuring out
how to coordinate all of the pieces to the OS. If you really know enough
about scheduling, and have a problem complex enough to warrant writing an
entire scheduler for it, you also know enough, and have enough resources,
to customize an existing OS (or even write one) with that scheduler.

(But then, I view processes as heavyweight threads, rather than threads as
lightweight processes; this causes certain things to be viewed in a very
different light).

As for pre-2.5.(mumble) not being threadsafe: See above, about the 30-50
threads. Go read the MySQL list rants about Linux threading. Go read
*anything* about signal handling under POSIX threads, versus LinuxThreads.
Then find a drill to remove the horror from your brain.
Joel Baker <fenton@debian.org>                                        ,''`.
Debian GNU/KLNetBSD(i386) porter                                     : :' :
                                                                     `. `'

Attachment: pgp_nhpiomsAY.pgp
Description: PGP signature

Reply to: