[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Follow-up to Error Message...

Seth Kurtzberg wrote:
Have you run a hardware test on the system memory? I've seen something very similar on a machine that had an intermittent memory problem. I realize that the trace doesn't precisely suggest that; nevertheless I've seen it happen (of course the specifics such as the LBA, etc. were different). Especially since this seems to occur intermittently and, apparently at different times (that is, it doesn't always happen at the same % done) it is definitely worth testing the memory hardware.

The indication that once any memory error occurs, _every_ subsequent operation involving memory allocation fails is suggestive. While it is possible that they are all caused by the same thing (that is, there is just one error propagating the error message up the stack), it doesn't look that way to me. Flushing the cache, I _believe_, is a separate operation.

Of course, it doesn't have to be hardware, it could be an o/s bug related to memory handling. That also sounds improbable but it is quite likely that this application stresses the memory handling code more than any other. I would put my bet on hardware.

Apologies if this has already been suggested and I missed it.


Jeff Ross wrote:

No a very descriptive subject line, and I apologize for that.

When running the backup script I posted in the previous message this morning from the command line, I got the following error message:

 89.98% done, estimate finish Wed Jan 19 09:26:09 2005
 90.36% done, estimate finish Wed Jan 19 09:26:09 2005
 90.74% done, estimate finish Wed Jan 19 09:26:09 2005
 91.12% done, estimate finish Wed Jan 19 09:26:09 2005
:-( unable to WRITE@LBA=1248c0h: Cannot allocate memory
builtin_dd: 1198272*2KB out @ average 2.2x1385KBps
:-( write failed: Cannot allocate memory
/dev/rcd0c: flushing cache
:-( unable to FLUSH CACHE: Cannot allocate memory
:-( unable to SYNCHRONOUS FLUSH CACHE: Cannot allocate memory

I actually got this 2 times in a row--the first time it failed at 67% done.

Curiouser and curiouser...

Jeff Ross

Hi Seth,

Thanks for the suggestion.  Given the overall stability of this system:

jross@samba:/home/jross $ uptime
 3:10PM  up 208 days, 21:54, 1 user, load averages: 0.10, 0.10, 0.08
jross@samba:/home/jross $

I'm going to have a hard time believing this is a memory problem. I just downloaded and am running memtest now, so we'll see.


Reply to: