[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Broken bootable SPARC CD#1, and why this happened



Anthony Towns <aj@azure.humbug.org.au> writes:

> Well, one thing that'd help would be having a cdimage.debian.org that
> doesn't crash all the time. That's the main reason we didn't have any
> time at all to check things, or for Phil to double check things with you
> as to how things should be done when the first sparc images didn't work.

I'm working on it -- open's getting a full body transplant on Tuesday
(or thereabouts).

I know it's been a pain in the arse, but I think I actually made the
right decision in leaving it as it was for the duration.  Admittedly,
open died the moment I started building CDs, but once rebooted
(unfortunately 8 hours later, waiting for someone to fsck /), it's
actually stood up to the load reasonably well all things considered (2
30 minute outages), whereas we could have done a panic replacement
with untested hardware, and found ourselves without anything.

Anyway, once it's plugged into it's 100Mbit LAN, and is an Athlon 650,
rather than a P166, these problems should be behind us, with a bit of
luck.

> Another thing that would help is getting this stuff more automated and
> common. While boot-floppies and kernels and cd images are all being
> made by one or two people who know how to tweak the settings correctly,
> we're going to keep having problems like this. Much better, IMO, to setup
> cdimage.debian.org (or similar) to build a new set of CDs once a week,
> automatically, ideally straight from debian-cd.deb.

Nice idea, but it's taken until very recently to get the scripts into
this state, with constant feedback -- if we were unable to tweak the
scripts to make them work, they'd never work as well as they do.

And then we find that they still don't work ;-)

> More directly though, we should be able to very easily setup some automated
> tests to make sure this doesn't happen again. After building the CDs, mount
> them over loopback and checking device files have correct ownerships and
> permissions, or check that various packages in base are all on CD#1, or
> similar.

Now this is a very good idea.

> The more checking and testing we can offload from volunteers onto machines,
> the better. We can always get more machines, getting more people with the
> requisite clues and free time is much harder.

It's almost impossible to remember all the little things that might go
wrong as well, so encapsulating that knowledge in a regression test
suit is definitely the way to go.

The fact that the CDs always need to be built in the early hours
doesn't help.

Cheers, Phil.



Reply to: