[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Stability issues with Wheezy/Linux 3.2.0 on Loongson2f



Hi.

On 5 November 2013 02:32, David Kuehling <dvdkhlng@posteo.de> wrote:
>>>>> "Graham" == Graham Whaley <graham.whaley@gmail.com> writes:

> On 4 November 2013 07:18, David Kuehling <dvdkhlng@posteo.de> wrote:
>>>>>> "Andreas" == Andreas Barth <aba@ayous.org> writes:

>> * David Kuehling (dvdkhlng@posteo.de) [131103 16:00]:
>>> Since then I've encountered system deadlocks every one-two days (the
>>> system in question is running continuously 24/7).  Deadlock meaning,
>>> that the system does seems completely dead, even num-lock LED cannot
>>> be toggled any more (but fan is still spinning etc.).
>>>
>>> I never had stability problems on kernel 2.6.39.  I did have a
>>> single deadlock when testing the debian-backports kernel package for
>>> kernel 3.2.0 on debian squeeze (but I ran that kernel only for about
>>> 2 days before upgrading to Wheezy).

>> Can you try the old kernel if it happens with the old kernel and new
>> userland?
[..]
> As to the instabilities - it occurs to me if this may be connected to
> the Loongson 2f 'issues', as documented at [1] I believe (and btw,
> would love if somebody could confirm and point me at any archive
> links) that Debian-mips moved from MIPSI to MIPSII ISA when it when
> from Squeeze to Wheezy. I'm wondering if maybe that change in code
> layout may have bought one of these issues to the surface? Or maybe
> that your kernel or RFS needs to be built with the options listed in
> the link, and you've been "lucky" so far?  As far as I can find out,
> there is no easy way (apart from maybe looking at the top of the chip
> :-( ) to tell if you have a 2F01, 2F02 or 2F03 version of the 2F SoC,
> and only the 2F03 is 'fixed' :-(  Anybody know for sure? I'm sure this
> has probably been discussed before in the past.

> Please feel free to educate me on if any of these 2F fixes are turned
> on by default for upstream Debian. I doubt they are? And sorry if I've
> missed some subtlety here?

Hi Graham,

I had the same thought - the lockups certainly look similar to what I
experienced when running a Linux kernel compiled from source without the
Loongson2f instruction fixes enabled in the kernel config (that would
indicate I have one of the older SoCs).

Looking at the output of objdump -D libc.so, it looks to me like the
correct "fixed" NOP sequence is used (shown by the disassembler as "move
at,at", which is synonomous for "or at,at,zero).  So Debian userspace
looks like it's loongson compatible.

Running objdump -D on the 3.2 kernel image (that's a gzip compressed
image, so I guess the code I see is only a small bootstrap sequence for
ungzipping the rest) I can see the right NOP sequence plus the extra
code in front of indirect jumps (e.g. function return statements).  This
looks like being compiled with -mfix-loongson2f-jump plus
-mfix-loongson-nop.

So far this looks good.  Hopefully we're not hitting new, undocumented
CPU bugs here.

After finally getting update-initrd to build a working image for my
2.6.39.4 kernel, I'm now back to running the same kernel I used a long
time with squeeze.  If the lockups don't happen until the end of week
we'll have another data point.
Any luck with this David - did it lock up, or still running ?

 Graham
 

With 2.6.39.4 BTW I'm not using the loongson2f optimized libc from
package libc6-loongson2f (only newer kernels seem to supply the right
hwcap info for ld.so to choose the loongson2f optimized versions of
libraries).  I'll have to run another test to see whether the loongson2f
libc has anything to do WRT lockups.

I noticed that going from linux 2.6.39 to 3.2, the process scheduling
improved dramatically.  On 2.6.39 'nice' values seem to be ignored, and
output of 'top' often looks wrong, like multiple processes using exactly
the same CPU amount, without any variation.  Maybe newer loongson2f
kernels changed to using a more accurate clock source for process CPU
usage accounting.  These changes could also be a source for deadlocks.
Hopefully I'll not have to bisect all linux versions before 3.2 to
finally solve the issue.

cheers,

David

--
GnuPG public key: http://dvdkhlng.users.sourceforge.net/dk2.gpg
Fingerprint: B63B 6AF2 4EEB F033 46F7  7F1D 935E 6F08 E457 205F


Reply to: