[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: open issues with the hppa port



> On Thu, Jul 30, 2009 at 10:50 AM, Andreas Barth<aba@not.so.argh.org> wrote:
> > You know your porters mailing list best, but I want to highlight some of
> > the issues:
> > http://lists.debian.org/debian-hppa/2009/07/msg00002.html
> 
> I can't comment on this issue. I hope Dave can?

Over the past few weeks, I have been testing 2.6.30.y on three different
platforms (c3750, rp3440 and A500-7X).  I have run identical 32 and 64-bit
kernels on the c3750.

To the base system, I have applied a collected set of patches.  Except
for the typo change recently posted to the parisc linux list, all the
changes are now in 2.6.31.

With the exception of nscd, I have had no segfault problems with 2.6.30.y
on the c3750.

However, the same is not true for the rp3440 and A500-7X.  The rp3440
is worse than the A500-7X, but application segfaults occured very quickly
running SMP kernels building GCC (usually in our old friend the dynamic
loader).

The A500-7X (gsyprf11) is now back running a modified SMP version of
2.6.19.22.  Last change was the U bit fix.  It has now run eight days
without any obvious segfaults.

2.6.19.22 with the above changes is not segfault free on the rp3440.
However, it is better than any other SMP build on this processor.

I am currently running a UP build of 2.6.30.3 on the rp3440.  It is
not segfault free, but I can usually get through a GCC build without
a fault.  So, even with a UP kernel, we still get cache corruption
on this machine.  I wonder if it is possible to turn L2 off.

I had hoped that the U bit fix would help.  However, its effect is
not dramatic.  When rebooting the rp3440, it would sometimes report
memory errors in the system hardware log.  Similarly, the display
attached to the VisEG on the c3750 would sometimes get noisy.
Resetting the display mode at boot would cure this.  Another effect
was for cpus to mysteriously get disabled.  I suspect that
the kernel was sometimes accidently writing to the control memory
for these devices.  These problems may be fixed or reduced with
the U bit fix.

In summary, the segfault problem is still there and a major issue,
particularly with SMP kernels.  Without a testcase that consistently
triggers the problem, it's almost impossible to debug what's going
wrong.

glob2 built for me, so the build failure was probably caused by cache
corruption.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)


Reply to: