[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

mission critical Debian




  I recently had the interesting experience of bringing up a rack of 20
dual processor Itanium II machines (HP zx6000s), all now running Debian, over
a 48 hour period. The interesting part was all the equipment showed up on
monday, and we had the classic "very big demo" setup for thursday morning. The
decision was made to see how far along we could be to actually using
the new rack for the demo, so off we went... Tuesday was spent plugging in
cables, setting things up, and then we launched into installing Debian
from CD. (this was in a physically isolated network). I've installed
Debian (and Redhat) on a number of machines, but the entire stack was
a bit daunting... (also warm, you could feel the heat when you walked down the
hallway)

  Once the two sysadmins working with me got the hang of it, they started
installing like gang busters, while I had to still port a critical part of
the software to Linux. As I had never seen a live data feed for this
application before, there were several code bugs (it turned out what I ported
didn't run on any platform anymore...) in the networking. Prior to this I
had added byte swapping to the networking code, so I knew once I got the
networking issues fixed, I could "probably" read the data... Surprisingly
that only took a few hours. Then I took this code, and started to modify the
networking code to read a 32 big endian data feed, on a 64 bit little endian
system. Some things like long, time_t, and bool change between the two
platforms, so I had to hack on the data types, and do some nasty munging
as I read in the data, but things again came up in a few hours. About that
time, we had the first 8 machines on the rack fully installed, and booting
Debian. By installing a FAT 16 partition on the other drive, we managed to
make them all dual boot, although I don't think we'll be using HPUX much...

  At this point we called it a day. Wed I started bring up the freshly ported
software on the machines that were already working, while they kept cranking
to get the rest all installed. This I mostly did by NFS mounting from the
executables, and doing the other system setup tasks that needed to be done
to run this large application. Eventually the data feed went down, and despite
a series of "early morning" phone calls to the east coast, nobody could fix it
till daybreak. I brought up the other machines off some prerecorded data files
I had made earlier in the day, and eventually had our software running on the
entire rack of 20. I used an HP wx6000 machine, also running Linux, to remotely
display the GUIs for each of the machines on the rack, one machine per
workspace. Suprisingly, a single wx6000 managed to run remotely 20 machines
running reasonably high end graphics. The live data feed had come back up
by morning, and we had all 20 machines up and running about 30 minutes
before all the suits came into the room for the demo. :-)

  General impressions are, what a great platform! Debian installed like a
champ, and each machine came up pretty efficiently. The only big problem I
had is GCC 2.96 sucks, and 3.0.4 isn't much better (We were running stable
woody). I finally had to go with a recent build I had done of GCC "3.4" from
a few weeks old CVS tree. The other versions kept having weird problems with
C++ and systems headers, which just "went away" using a more recent GCC. I had
to give up on our one XView based application (thanks to this list, now I
know why...) , but managed to get the new Java version up instead using
Sun's java for Redhat 7.3.

  A few other thoughts. I'd really love to see a port of Valgrind to the
Itanium. It's my favorite memory checker. I'd also like to really see
the NPTL work supported. I'm tired of always having to rewrite POSIX
semaphore code to use SVR4 semaphores.

  The software was NASA's Traffic Flow Automation System (TFAS), an
experimental project for strategic Air Traffic Management based on
NASA's Center-TRACON Automation System (CTAS), which has been
operationally deployed by the FAA in a number of air traffic control
centers (http://ctas.arc.nasa.gov).  Each machine runs CTAS software
that predicts the movements of aircraft within an Air Route Traffic
Control Center (ARTCC). There are 20 ARTCCs in the continental US,
and so 20 machines cover all the traffic in the country.  If TFAS
is accepted for use by the FAA (this is still under consideration),
it will definitely be considered a 'mission critical' application.
We've tested TFAS with Ultrasparc/Solaris, HP's PA-RISC/HP-UX,
Pentium/Linux, Power-PC/MacOSX and so far the Itanium/Linux combo
is our platform of choice. So one of these days you'll be flying on
ia64-Linux based systems. :-)

	- rob -



Reply to: