[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Debian Woody running SMP on a SPARCserver 1000E (sun4d)



I'm afraid this is rather old news, but I plead that I've been waiting for Sarge
for the last year rather than reporting the situation with Woody which was
obsolescent. Also for various reasons my ML gateway has been inoperative and
this work is fairly low on my list of priorities.

Around a year ago I acquired a number of SPARCserver 1000E machines,
unfortunately they were very much a "mixed bag" and on inspection quite a few of
the boards had damaged SIMM latches and (I suspect) other problems. I was able
to assemble one system with a fairly full memory space and 8x CPUs, this booted
and ran RH6.2 reliably on a single CPU.

When I tried to install Woody on this machine I got the dreaded "Data Access
Exception", and on researching this determined that it was because the sun4d
kernel on the Woody CD was SMP. I worked around this by compiling a UP sun4d
kernel on a SS20 and booting over the LAN, at which point I was able to
bootstrap Woody on the SS1000E to the point where it could compile its own
kernels.

Looking at what other people had reported about this problem and using the LEDs
and PROM debugging messages I eventually determined that the DAE can be avoided
by making a one-line kernel change, at which point the SS1000E runs 8x CPU SMP
reliably. Specifically, I have had more than one machine with up to 8x
SuperSparc-50s running with firmware 2.23, however I've had problems with
another firmware version where it gave a watchdog timeout /before/ the "Booting
Linux" message: I think this is probably a different issue.

In arch/sparc/kernel/sun4d_smp.c there is a call to calibrate_delay(): this
should be commented out. As far as I can tell (and I stress that I am neither a
Sun guru nor a kernel hacker) it is only used for the secondary CPUs which
default to the same speed as the primary one- and who in their right mind would
try to run dissimilar CPUs SMP?

Furthermore, looking at the calibrate_delay() code I suspect that the way that
the global loops_per_jiffy variable is being used as a scratchpad is unsafe.
Specifically, if on a particular SMP architecture (here sun4d) interrupts are
not fully disabled while calibrate_delay() is running then anything which
inadvertently uses the value of loops_per_jiffy could get into trouble.

I have not made similar progress with a scratch-compiled 2.4 kernel. However I
admit freely that I do not have a good understanding of the Sun and SPARC
architectures, I do not have documentation for the standard chips and I lack the
magic document that describes the sun4d architecture. Without wanting to commit
myself to putting significant time into these machines if anybody could fill me
in with these documents I'd very much appreciate it.

I've attempted to graft Sarge's initrd onto my hacked 2.2 kernel, and while I've
got it to boot it is not able to find its CDROM. I believe (although as with all
things in this posting I'm open to correction) that the problem is that 2.2
doesn't include devfs whilst this is assumed by Sarge.

Finally, as a newcomer to this list here's a brief bio. I'm an electronics
engineer who's done a fair amount of embedded system work and whilst I've got
reasonable familiarity with PC hardware and software I've got a fair amount of
other experience including commercial DP kit and having written a microkernel
for '386 protected mode in Modula-2. The Sun machines are a new area to me but I
want to make sure that if the x86 platform becomes totally compromised by flash
and microcode viruses that I'm familiar with at least one other architecture. I
am not married, have no children, and do not live in East Sussex :-)

-- 
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]



Reply to: