[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: HPPA and Squeeze



On Tue, 07 Jul 2009, Carlos O'Donell wrote:

> On Tue, Jul 7, 2009 at 12:21 PM, John David
> Anglin<dave@hiauly1.hia.nrc.ca> wrote:
> >> So if I characterise the problem you think you're seeing: on mmap of a
> >> file at a memory location to be determined by the kernel, a sequential
> >> set of reads of the mapped location eventually turns up a zero where
> >> there should be data?  Yes, it does sound like a caching issue.
> >
> > Yes.  The loop is terminated by a null tag:
> >
> >  while (dyn->d_tag != DT_NULL)
> >      {
> >         ...
> >      }
> >
> > However, the core dump doesn't show a null tag before the STRTAB tag
> > that caused the segmentation fault.
> 
> Do you mean "after" the STRTAB tag? I assume the library on-disk has a
> DT_NULL, otherwise it would always fail.

I'm sure that there is a null tag after the STRTAB.  The segmentation
fault occurred because the get operation failed after processing
the first NEEDED tag and before the STRTAB tag.  The loop goes
sequentially through the array of DT objects in the recently mmap'd
data and inserts pointers to these objects into the dynamic loaders
link map for the file (in the l_info field).  There were no null tags
between the NEEDED entry and the STRTAB entry in the mmap'd data in
the core dump.  The DT objects are near the end of the mmap'd data.

I would guess that the loop terminated early because the l_info array
is all zeros except for the first NEEDED entry.  It appears correct.  The
loop might have terminated early because of a cache issue, or possibly
the value loaded from memory somehow got corrupted.  Another possibility
would be the mmap operation wasn't complete when the memory was examined
by the dynamic loader.  When the core dump was done, the operation was
complete.

I think it's less likely that a cache issue affected the memory used by
the dynamic loader (l_info field) as the data before and after in the
map seemed reasonable.

The fact PA8700 processors are also experiencing similar problems
would seem to suggest that this isn't a PA8800 L2 issue unless we have
multiple problems.

I think we need to try running a recent kernel on gsyprf11 for a while
to see if we can capture a similar event.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)


Reply to: