[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: RFC: implementation of package pools



Jason Gunthorpe wrote:
> 
> Notice that a telco for instance runs their buisness expecting their
> software and hardware to fail. It is a calculated risk that is managed to
> not seriously effect their earnings. Banks loose money, telcos misdirect
> calls, but it is infrequent.
> 

Well, still Debian is not a bank. :) I'm glad that.

I think that the automation to be done for package pools or
dpkg aren't that difficult.

> Only if you have a limited view of what can go wrong. Sure you can rebuild
> most of the SQL database, but how are you going to deal with massive
> filesystem corruption, or an accident in one of the scripts?

What can a human do about massive filesystem corruption?
Do you go and review every i-node manually? I bet you just use some
incompetent ext2fs tools. That's the way I usually have to do in such
cases. In case of disaster, you can put checkpointing and rollback to a
consistent state automatically pulled from a safer backup, that's the best
a human or a program can do. An accident in one of the scripts... Which
scripts are you referring to? The pool manager shouldn't have such
mistakes, right? :) If you're referring to package contents, then I
presume a package might be tested before automatically put into archive.

> > The idea is to automate things. Automatons can be more reliable than
> > humans.
> 
> Sure, but an automation that is infact able to handle every possible
> failure case (like a well trained human) is generally uneconomical to
> build.
> 

you can do that when your specification is complete. the stuff we're
talking about doesn't seem to require any theorem prover.

> Try this, design a B-Tree structure that stores a set of words. The
> underlying hardware you are running on destroys 0.001% of all your
> 'pointers' in an unpredicable way. Design an algorithim to operate 100%
> correctly and a seperate one to operate 99% correctly, but allow a human
> to fix the tree structure if necessary. Which is longer? Which is slower?
> If you could use another datastructure besides a B-Tree could you get 100%
> reliability? (perhaps one that did not use 'pointers')

use redundant data and error correcting codes, that will do it. a 
B-Tree implementation may have has such concerns. I'm not very interested in
file organization, though.

> Actually, I've done benchmarkings on this case, I can get several
> magnitudes speed gains with binary databases - how ever the time spent
> loading the database is small enough that this can't justify the size of
> code required to support a fault proof implementation.

Make it better than NT's registry and it will do it. Check this, my
var partition crashed, I mean the filesystem is gone, and no way to
recover it. (This actually happened) Even in such disasters there are
a couple of things a program can do. I'm sometimes surprised why
there aren't such fault-tolerant code in Debian. (I know, you'll tell
me to go ahead and write one)

> Further, remember, any monkey can fire off a hundred thousand line
> program, complete with the most advanced algorithms and best
> speed-enhancing techniques. That isn't really hard at all.
> 
> The real trick is producing software that will meet a need, last, adapt to
> changes and remain viable years into the future. Not surprisingly a huge
> amount of software out there (free and commercial) does poorly at
> some of these facets :|

I agree 100%. Efficient code does not imply maintainable or well written
code. The skill is in writing extensible, adaptible, maintainable code.
Even top hackers are not good at that.

> AJ was quite right to bring up examples like ext2 and the linux kernel -
> they are good example of mixing both requirements, and how fine that line
> is.

ext2 isn't a good filesystem. and linux isn't a good kernel. the fact
that we're running them doesn't entail that their philosophy is correct.
we're running them because they're the best [in some aspect or another]
in the free world.

> Anyhow, we can revisit the hash function, framed in the above. You contend
> its distribution is imperfect and can be improved. However:
>   1) Changing it will introduce more complexity
>   2) Changing it will not significantly enhance performance
>       [ By your own stats we have about 400 directories/dir. However that
>         is about 3x smaller than the current worst case - which *still*
>         performs acceptably. ]
>   3) Changing it introduces a 'personnel problem' which has a very
>      real cost.
> 
> Given that, do you still think it is at all important?
> 

No, no. It's okay, because it's just a workaround for fs... I just
don't believe things before I see numbers. When I checked for myself,
I was convinced as I posted those stats.

The disagreement is about automation. I don't know if you've dealt
with the design of a large system with secondary and tertiary storage,
but I'd been involved. The requirements are in fact very similar. Secondary
storage here corresponds to, say, incoming directory and tertiary storage
is ftp distribution. makes sense? We'd integrated a lot of automation
into a system that's 10 or 20 times more complex than the whole
package pools story and it'd worked fairly well. Medical stuff...

Of course never mind if you aren't interested in automating things.

-- 
Eray (exa) Ozkural
Comp. Sci. Dept., Bilkent University, Ankara
e-mail: erayo@cs.bilkent.edu.tr
www: http://www.cs.bilkent.edu.tr/~erayo



Reply to: