[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

ia64 2.4.17-mckinley-smp vm crash



A non-root process that accesses large amounts of memory can crash the
kernel.

I am hoping someone has already solved this problem that we have just
run across here.  Here are the details.

On an HP ia64 zx2000 with 4 GB of RAM and 8 GB of VM and one processor
running the Debian 2.4.17-mckinley-smp default installation kernel.
All packages from woody stable at this time with all available
updates.  (I tried updating to the testing kernel but a package
problem with the dependency 'dash' makes that uninstallable.  I could
try sid if needed.)

I crafted a test program which illustrates the problem in as small of
a case as I could reduce this to.  This program tries to thrash all
available memory.  It does not allocate the memory in one chunk but
rather in one megabyte chunks.  Then it randomly walks through the
memory writing random data to random places.  On my host it gets
through the first iteration completely and starts on the second.  At
which time the kernel panics.

I see nothing in the syslog for the time at which the machine
crashes.  When it boots back up it looks like there is a dump of
information as if it was able to detect that it had crashed
previously.  I can furnish any of that but the data is large and I
can't tell which if any is pertinent.

Meanwhile, back at the ranch, the Redhat kernel 2.4.18-e.12smp
installed on the same hardware does not crash under this same task.
Therefore there is definitely something which is known by someone
which solves this problem.  I am hoping to get this fixed so that I
don't hear too much finger pointing and people saying we should use
brand X instead of brand Y because of this.  It is too late in the day
today but tomorrow I will try to find the RH kernel source and see
what patches are included there.

Anyone have any suggestions?

Thanks
Bob

WARNING!  This program will (likely) crash your host.  YOU HAVE BEEN
WARNED!  Don't run this unless you are okay with the possible crash.

  cc -o crashme-vm crashme-vm.c
  ./crashme-vm

File crashme-vm.c:

#include <stdio.h>
#include <stdlib.h>

int get_mem_info(size_t *memsize)
{
  char buf[1024];
  FILE *fp;
  if ((fp = fopen("/proc/meminfo","r")) == 0)
    {
      fprintf(stderr,"Error: Could not open /proc/meminfo\n");
      return -1;
    }
  while (fgets(buf,sizeof buf,fp))
    {
      if (strncmp(buf,"MemTotal:",sizeof "MemTotal:" - 1) != 0)
	continue;
      *memsize = atol(&buf[sizeof "MemTotal:" - 1]) / 1024;
      fclose(fp);
      return 0;
    }
  fclose(fp);
  fprintf(stderr,"Error: Unable to parse contents of /proc/meminfo\n");
  return -1;
}

int allocate_memory(char ***buf,size_t memsize)
{
  int i;

  memsize += memsize / 10;	/* add 10% over physical memory */
  printf("allocating %ld\n",(unsigned long)memsize);

  if ((*buf = (char**)malloc(memsize * sizeof (char*))) == 0)
    {
      fprintf(stderr,"Error: Insufficient virtual memory: Need %ld MB\n",
	      (unsigned long)memsize);
      return -1;
    }

  for (i = 0; i < memsize; ++i)
    {
      if (((*buf)[i] = malloc(1024*1024)) == 0)
	{
	  fprintf(stderr,"Error: Insufficient virtual memory: Need %ld MB\n",
		  (unsigned long)memsize);
	  exit(1);
	}
      printf("allocation %d\n",i);
    }
  return 0;
}

int thrash_memory(char **buf,size_t memsize)
{
  int i, j;

  srandom(42);
  for (j = 0; j < 1000; ++j)
    {
      for (i = 0; i < 100000; ++i)
	{
	  int indx1 = random() % memsize;
	  int indx2 = random() % (1024*1024);
	  buf[indx1][indx2] = random() % 256;
	}
      printf("iteration %d\n",j);
    }
  return 0;
}

int main()
{
  size_t memsize;
  char **buf;
  if (get_mem_info(&memsize) < 0)
    exit(1);
  
  printf("memsize = %ld\n",(unsigned long)memsize);
  if (allocate_memory(&buf,memsize) < 0)
    exit(1);
  
  printf("Begining to thrash memory...\n");
  if (thrash_memory(buf,memsize) < 0)
    exit(1);

  return 0;
}



Reply to: