[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Learning from Eratosthenes



-----BEGIN PGP SIGNED MESSAGE-----

Things to be learned from the Sieve of Eratosthenes
- ---------------------------------------------------

Code is at the bottom of this article for reference.

It was a dark and windy day...  Well, it was, but thats not apropos our
story.  Our story is darker and deeper than any windy day, starker than
any gothic mystery. It is a tale of compiler misdeeds, of errant code
and uncooperative systems.

On aforesaid dark and windy day, I decided to whip up a little seive of
eratosthenes to enumerate all the prime numbers less than 0xFFFFFFFF,
which is the largest quantity a 32 bit entity can designate.

Oh, I was going to be clever.  I decided to use a technique I'd learnt
from Bentleys "Programming Perls", wherein each bit in sequence stands
in for a whole number.  After an hour or two of coding, I was done.
But I didn't stop there.  I added a trick to pre-eliminate all multiples
of 2, 3, and 5 from having to be knocked out of the prime race in the
algorithms main loop.  Yes, I was clever.

Oh, I was going to be robust too.  I put in file operations so the
algorithm would operate directly on a file to substitute for things like
malloc() and free(); then I used mmap() to speed things up dramatically
by using memory access operations.  By having two different methods, I
figured debugging would be a pinch.  Should one fail, the other would
surely work; Therein was the first of my pitfalls.  Yes, you guessed it,
neither worked.

Firstly, mmap caused a segfault.  I stared at my code, saw nothing
wrong, compiled and ran it again.  It ran this time; I breathed a sigh
of relief and chalked things up to phase of the moon.  Now is the time
to introduce the hardware my doughty little program was to run on.  It
is an AMD K7 running at 500Mhz, containing 128M of RAM, and as much disk
as I could wish.  I knew 128M of memory wasn't enough to hold the 550M
demanded by my algorithm; no worries though, the operating systems
virtual memory subsystem would use swap when it ran out of ram; and I
imagined if it ran out of swap it would just make up some more.  Speaking
of the operating system, it was OpenBSD 2.7.  After a few minutes running,
I noticed the box acting sluggish.  Then I couldn't log in.  A couple
hours later I couldn't even ping it.  Time to reboot.

What went wrong?  Discussion with some of the FreeBSD kernel developers
revealed there is a known bug with mmap() that affects all the BSD's.  For
all I know it may also affect Linux; I haven't tested it there.  The bug
is thus: the limits set on a user do not get counted when mmap() is called.
So even a lowly user with no privileges whatsoever can still cause so much
memory to be allocated to themselves they bring the system to its knees.
Which I had just done.  Sadly, I abandoned the mmap technique.

Now I turned to the technique of doing all accesses via the filesystem,
with calls to lseek(), read(), and write().  They are easy to invoke;
I thought I knew them like the back of my hand.  The code was logically
perfect.  But for several days I got incredibly wierd results from
my little seive.  How could such a simple algorithm go so wrong, I
wondered?  The first byte was alright; the second byte got all its bits
turned to 1; and every subsequent byte got turned to the sequence 10101010.
Close instrumentation and observation made the behavior all the more puzzling.
I used a debugging function called try() to track the malfeasant
bit of code to its lair.  try() works like this: you give it an expression;
if errno is set when its evaluated, it does a perror and bombs out.  Perfect.

I couldn't believe what my impeccable debugging was telling me: there was an
illegal argument being handed to some function call.  Eventually I even found
out it was lseek() that was the problem.  Returning to the FreeBSD developers
didn't help this time.  One, Zippe, listed the two usual mistakes people make
in using lseek().  Mine wasn't either of them.  3 days after having started to
code, a glance at the manpage for lseek one last time made the connection.
The arguments were reversed!  One needs to invoke it as lseek(fd, offset, SEEK_SET),
not lseek(fd, SEEK_SET, offset).  Since some experienced developers hadn't noticed
that error, I didn't feel so bad about making it myself.

Well, that was all well and good. I got my little seive trundling away in
the background, figuring that while using file access would take a bit
longer than the mmap method, it couldn't take that much longer, since,
after all, mmap also relied on synchronization with an underlying file.

Two days later, the seive still chugging away,  I changed my mind.  I wanted
to try the mmap method again, on a machine that had enough RAM to give my
program its head.  A fellow Debian developer casually mentioned the name of
a machine that suited the purpose perfectly, a TI UltraSparc II.  Well,
that was nice.

Following the time honored procedure, I started my seive with higher and
higher limits.  All was good until I hit 0xFFFFFFFF.  Puzzled.  Backed
down again; it worked.  Hmm, I know, it must be that mmap!  Yeah! Its
buggy!  Theres lotsa RAM here; why don't I just malloc() the memory I
need and operate on it directly?  Surely that can't be buggy.  Betcha its
faster too.  Codecodecode.  Compile.  Doh! Same error!  Backed down again,
to 0x7FFFFFFF.  Worked.  What was happening?  It seemed bogus; without
diagrams to show how the algorithm is supposed to work its hard to describe,
but basically after striking out all multiples of a prime, one then goes
back to that prime and searches the numbers after it until one finds
the NEXT prime, one squares that, then knocks off all multiples of the
new prime.  It started off swimmingly... But then went wrong.  It started
MISSING primes.  There seemed no reason for the code to do this.  To
this day I still don't know why.  But by now I noticed the problem only
occured when the very highest bit was turned on.

This seemed too nasty to be coincidence.  I changed some variables from
unsigned 32 bit integers to unsigned 64 bit integers, and reran
the algorithm. Disaster!  Now not only was the algorithm skipping
primes; it was saying all primes were the number zero!  bod from IRC
mentioned that using printf("%Ld", (unsigned long long)foo) might do the
trick.  He was right; the numbers magically reappeared in all the right
spots.  Even better, the performance suddenly came within the bounds I
had estimated; it found all 4 billion plus primes in 24 minutes.  Vastly
better than the hour and 1/2 it had taken using 32 bit quantities as the
indexes.

This leaves a question: WHY does the algorithm screw up when the value of
MAX is 0xFFFFFFFF?  Why does it screw up in the particular way it does,
producing the sequence of numbers 7 11 13 19 37 107 373 449, and missing
all the primes in between?  Why is the screwed up version so much SLOWER
than the one I ended up with?  I don't have answers, but I do have working
code.  May it puzzle and befuddle you as much as it did me.


/*
 * generate all possible 32 bit primes
 * all multiples of 5, 3 and 2 are already excluded
 * by using the following bitmask to speed things up.
 */

#include <errno.h>
#include <fcntl.h>
#include <inttypes.h>
#include <math.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <time.h>
#include <unistd.h>

/* #define MAX 0xFFFFFFFF */
#define MAX 0xFFFFFFFF

typedef unsigned char byte;

void            try(int, byte *);
byte           *byte2str(byte, byte *);

byte            seed[15] = {
	0x35, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41
};

byte            fa[960] = {
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41,
	0x40, 0x14, 0x51, 0x05, 0x04, 0x51, 0x44, 0x14, 0x11, 0x45, 0x10, 0x50, 0x45, 0x14, 0x41
};

int
main(void)
{
	FILE           *f;
	uint64_t        i, j, maxroot;
	time_t		tm, tn;
	byte           *map;

	try((int)(map = malloc((MAX>>3) + 960)), "malloc");

	tm = time(NULL);
	printf("Initializing map..");
	memcpy(map, seed, 15);
	for (i = 15; i < (MAX >> 3); i += 960) {
		memcpy(map+i, fa, 960);
	}
	tn = time(NULL) - tm;
	printf(".done  %s", ctime(&tn));

	maxroot = ceil(sqrt(MAX));
	tm = time(NULL);
	printf("Sieving primes..");
	for (i = 7; i < maxroot; i++) {
		if (!(map[i >> 3] & (0x80 >> (i & 7))))
			continue;
		for (j = i * i; j < MAX; j += i)
			map[j >> 3] &= ~(0x80 >> (j&7));
	}
	tn = time(NULL) - tm;
	printf("..done  %s", ctime(&tn));

	tm = time(NULL);
	printf("Outputting primes to primes.txt ..");
	f = fopen("/scratch/krooger/primes.txt", "w");
	for (i = 0; i < MAX;) {
		for (j = 0; j < 8; j++, i++)
			if ((0x80 >> j) & map[i >> 3])
				fprintf(f, "%d\n", i);
	}
	fclose(f);
	tn = time(NULL) - tm;
	printf(".done  %s", ctime(&tn));

	return 0;
}

byte           *
byte2str(byte b, byte * s)
{
	int             i = 8;
	while (i--) {
		s[i] = b & 1 ? '1' : '0';
		b >>= 1;
	}
	return s;
}

void
try(int expr, byte * msg)
{
	if ((expr == -1) || (errno != 0)) {
		if (msg == NULL) {
			perror("try");
		} else {
			perror(msg);
		}
		exit(1);
	}
}

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3ia
Charset: noconv

iQCVAwUBOeLsbsK9HT/YfGeBAQFc2AP/U5ySR/iUISQrATSlQasEGOXY0yprNOEo
Ot8sbzB1F6N6I9fO1k58as6DOrLWsT7R2aiHllLd6P33rurV48aSywtYSNwIu7FG
byWarhqScWxUOCX9xudGKpm4zG5fotWZgp8s/u427gr0kgSbSLhZpXBsfEjEChz7
r/3Ant9bg14=
=j8H0
-----END PGP SIGNATURE-----



Reply to: