[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: open issues with the hppa port



On 07/30/2009 07:44 PM, John David Anglin wrote:
On Thu, Jul 30, 2009 at 10:50 AM, Andreas Barth<aba@not.so.argh.org>  wrote:
You know your porters mailing list best, but I want to highlight some of
the issues:
http://lists.debian.org/debian-hppa/2009/07/msg00002.html
I can't comment on this issue. I hope Dave can?

Over the past few weeks, I have been testing 2.6.30.y on three different
platforms (c3750, rp3440 and A500-7X).  I have run identical 32 and 64-bit
kernels on the c3750.

To the base system, I have applied a collected set of patches.  Except
for the typo change recently posted to the parisc linux list, all the
changes are now in 2.6.31.

With the exception of nscd, I have had no segfault problems with 2.6.30.y
on the c3750.

However, the same is not true for the rp3440 and A500-7X.  The rp3440
is worse than the A500-7X, but application segfaults occured very quickly
running SMP kernels building GCC (usually in our old friend the dynamic
loader).

The A500-7X (gsyprf11) is now back running a modified SMP version of
2.6.19.22.  Last change was the U bit fix.  It has now run eight days
without any obvious segfaults.

2.6.19.22 with the above changes is not segfault free on the rp3440.
However, it is better than any other SMP build on this processor.

I am currently running a UP build of 2.6.30.3 on the rp3440.  It is
not segfault free, but I can usually get through a GCC build without
a fault.  So, even with a UP kernel, we still get cache corruption
on this machine.  I wonder if it is possible to turn L2 off.

I had hoped that the U bit fix would help.  However, its effect is
not dramatic.  When rebooting the rp3440, it would sometimes report
memory errors in the system hardware log.  Similarly, the display
attached to the VisEG on the c3750 would sometimes get noisy.
Resetting the display mode at boot would cure this.  Another effect
was for cpus to mysteriously get disabled.  I suspect that
the kernel was sometimes accidently writing to the control memory
for these devices.  These problems may be fixed or reduced with
the U bit fix.

In summary, the segfault problem is still there and a major issue,
particularly with SMP kernels.  Without a testcase that consistently
triggers the problem, it's almost impossible to debug what's going
wrong.

glob2 built for me, so the build failure was probably caused by cache
corruption.

Dave

Dave, thanks for this very good summary!

I just want to mention my thoughts on this issue.

I see the point, that Debian needs a stable and reliable build server.
But just saying, that the whole parisc port is unstable due to a few
problematic servers is imho wrong.
My 4 machines (715/64, B160L, Tadpole parisc laptop and a C3000)  are absolutely
stable. The debian kernel is stable on those machines and even survives
big compilation tests.
That said, in my opinion and given my machines, parisc IS a stable platform.

So, if we have stability problems on the most important machines,
which are the debian build servers, then maybe some thoughts should be
given to replace those machines by slower but at least stable machines,
like e.g. a C3000 ?

That way, debian can continue to be built and we can concentrate on fixing
the remaining issues.

Helge

PS: Sadly I don't have neither a SMP machine nor one of the problematic boxes.
So, I'm currently not very much of a help to fix those issues (unless someone
sends me such a problematic box :-))


Reply to: