[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: itp: static bins / resolving static debian issues



On Tue, Aug 24, 1999 at 12:20:03PM +0200, goswin.brederlow@student.uni-tuebingen.de wrote:


> I can run a Debian system completly without static bins, I can run
> Debian without bash. Its all about opinions.

There's nothing wrong with opinions. My complaint is that many of the 
opinions posted are not subsequently substantiated by any kind of 
reasoning. Anyone can say that "dynamic linking is great, static linking
is horrible" but it takes some insight and reasoning to convince someone
who doesn't agree with that statement that it's true. 

So on a list like this, you really ought to post a reason for your 
viewpoint--that was my gripe with Craig.

> People trust that on a normal Linux bash will be available (that a
> general opinion), so debian will come with bash installed normaly.

Agreed.

> On the other side, dynamic or static linking absolutely depends on the 
> taste of the admin.

I don't agree with this. I think it depends on the type of server, and
the reliability guarnatees that it requires. If there is a requirement
for live recovery, and a realistic probability for a static-recoverable
failure, then there isn't really any taste about it. 

Whether the antecedent is true is obviously the crux: what is the 
realistic probability for a static-recoverable failure? In my previous
messages I presented numerous arguments that it is a reasonable to 
expect you'll need it. 

Some of those arguments I will mention again:

> I like dynamic linking and it saved me several times so far.
> Again thats an opinion.

Yes. Now you need to tell me how it saved you; and why this is 
relevant to static recovery tools (the current proposal; and nobody
is proposing that dynamic tools be removed!)

> But if Debian eliminates that opinion, that will be a bug.

Presumably you meant that option (static linking): yes it would be
a bug, and a huge step backwards. Nobody is proposing that we go 
back to the 1970's and have only static binaries available. We are
only proposing a small set of tools, adequate for fixing enough of
the system to bring the dynamics back on line. 

> If you like static binaries, build static 
> packages to go along with the dynamic ones. You might even build a
> static-base.tgz if you wish.

In the message that you are responding to, I pointed out several times
that many people who like static binaries don't know that they like 
them until they need them. I don't see a response to that here.


> As to your arguments, I can use the same for dynamic binaries.
> 
> Proposition #1, 2 and 3:
> 
> Does static linking make downtime cheaper or tolerable?

It makes downtime more cheaper and more tolerable in several ways:

  -- It is cheaper and more tolerable if the clients don't notice that
     there was a failure at all--dynamic linking would require a reboot, 
     killing the servers the client is using. Live recovery keeps the 
     servers up, and for example clients may not ever know there was 
     a problem with the NFS server, since to them it appeared to 
     continue working.

  -- Less intrusive failures are more tolerable. For example, the CGI's
     go down (dynamic) but the static pages stay up--customers can browse
     around the site and find out about products while they wait for 
     repairs to be made: they can also see a note explaining that the 
     site is experiencing temporary difficulties.

  -- More rapid repairs make downtime cheaper. For example, rather than
     rebooting from a slow rescue disk, taking the server offline for
     ten minutes or so (and locking everyone's up because of NFS, or 
     stopping their work by disconnecting an important database) the 
     admin copies the missing libraries from another machien via NFS
     and the problem is restored in 2-3 minutes

These are example cases, but they are actually things that happen, 
not just opinions.

> > Proposition #4-- Failure of the C library can occur under Debian
> 
> Sure, but does static linking prevent that? No it doesn´t.
> As soon as a new libc is released, the autobuild demons will compile
> it and all static linked binaries and upload them. You type "apt-get
> install upgrade" and then you have them all installed and your system
> goes down.

Yes, static linking prevents the situation you describe. The new 
binaries that get installed in your system, and which fail, will not 
result in an apparent failure to your NFS and http clients (except for
CGI). The existing, old versions of your servers are still running 
and are linked and loaded. They are unaffected by the catastrophic 
failure you have just wrecked on your system. You can calmly pull 
out your static recovery tools and fix it. 

To be fair, another concept I demanded earlier on that was omitted 
from the message you're responding to is that the statics need to 
bootstrap themselves. They need to install the new statics in a 
subdirectory, run a test on them, and only replace themselves if 
the test succeeds. 

That will avoid the failure you describe.

> Also static linking doesn´t prevent dpkg to remove some essential
> package,s ay sash for example. If you can proof otherwise, everyone in 
> the world will swictch to static linking. And how does static linking
> make a "rm -rf /" any less fatal?

Yes, it makes "rm -rf /" less fatal. It gives you extra redundancy, so 
that when you realize your error and ^C there is a much higher probability
that enough of your system is still intact that you can repair the damage.

In this case statics are an extra level of redundancy that make your 
OS a little bit more fault tolerance. Redundancy is a well known  
benefit when you are looking for reliability, and this case is a 
good illustration of why. 

Again, this is not opinion. You can logically analyze it, and you can 
look at practice--you have much more damage to do to your system to 
wipe out the redunancy.


> >...
> >       Proof: A disk error can corrupt some files but not others, such 
> >              as by duplicating inodes. The C library may be affected 
> >              while leaving much of the rest of the system intact. 
> 
> Do you agree that the likelyhood of a diskerror increases with the
> number of blocks (on a given medium)? So reducing the number of blocks 
> used does decrease the risk. So dynamic linking is safer. Sure, the
> diskerror might occur in the libc and thus damage the whole system,
> instead of just harming some non essential program. But by doubling or 
> tripling the size of the essential tools you double or tripple the
> risk of such an error occuring at all.

Here there are two responses. First, the statics do not depend on one 
another, but only on themselves. Thus you might lose a couple of statics,
but you will not lose all of them. Second, the statics again provide 
you with extra redundancy--your disk error actually has to knock out 
most of the statics AND the dynamics before you are in hot water. 

For example, you might lose the ability to copy files, or unpack 
archives, but your mount command might be left intact. Using your
mount command you can mount a disk from another machine and run 
a static copy command off the mounted partition. Alternately, maybe
you have lost mount, but you have a telnet session open on another 
machine and you have some packing tools and cat--you paste a binary
from the other machine over, and you're live. This is extreme, but 
if your server is important enough, possibly worthwhile. 

Note these are not a complete coverage of all the things you could 
do, just a few illustrations. The general point here is that 
redundancy is good, and statics, by definition (not sharing with
each other) provide that in spades. 

Finally I am not certain I agree that a disk error ONLY increases with 
the number of blocks on a given medium. If the disk error is actually
caused by a buggy kernel, or buggy memory, then it is also likely to 
increase linearly with the number of inodes. Since a typical dynamic
depends on 10-12 inodes, and a typical static depends on only  2
(itself plus the directory it is in) once again the probability 
of a dynamic getting wiped out is much higher.


> Now conside the case of a broken libc and dynamic/static linking:
> 
> static linking:
>         You have a broken system with 50+ broken binaries which you
>         must replace from the rescuedisk, extract from the base or
>         from an old  deb file. You probably won´t find an old .deb
>         file on the server anymore and you might have to recompile a
>         lot of packages after bugfixing the libc.

That sounds like two commands:

    cd /
    tar -xvzf /floppy/statics.tgz

Or equivalent ar command on a .deb, so what's your point? 


> dynamic linking:
>         You have a broken system just as well, but only one broken
>         lib. Copying that from the rescuedisk is easy. You can just as 
>         well allways keep a backup of a working libc on the harddrive
>         and copy that back into the system.

How would you perform that copy? I think you are forgetting that you 
are arguing AGAINST a statically linked copy command, static su, 
and static root shell. So you cannot assume you are allowed to 
copy files, your copy command and ability to become root do not exist.

Note that this is not an opinion, it is a fact about dynamic binaries
that they don't work when their library is unavailable.

You are NOT certain to have a root shell in the case of a hardware 
failure, a hack, or a software bug. You are not even certain to have
a root shell if you caused it by administrator error (many people 
use things like "sudo" as they provide extra security).


> >    Proposition #5-- Most servers will survive a C library failure 
> >              if the server is already running 
> 
> Most servers will survive the installation of a broken static binary
> just as well as broken dynamic lib.
> 
> Proof: The server is alread running.

You are not making any point here. Proposition #5 is a point I made 
to establish that there was a possibility of live recovery. You have
just demonstrated, again, that there is a possibility of live recovery.

Thank you for making my point?


> >    Proposition #6-- If a failure of the C library occurs, and the 
> >              servers are still running, then on a system where 
> > 	     downtime is considered unacceptable or difficult, you
> > 	     should fix the problem without a reboot if that is
> > 	     possible
> 
> Same with broken static binaries. What if cp is broken?

Then I use cat. Or if that is broken too, I try dd. Or if that is 
broken as well, I play around with mount and mv, or just mount 
by itself might get me going. Or if those are both broken--now I
start getting creative: does grep work? If so I can make a pattern
that matches everything. If that fails and I have ed, I still might
have a chance.

Statics are highly redundant, that's what makes them so useful during
a system failure.


> >    Proposition #9-- Statics recovery tools are more fault tolerant
> >                     than dynamics equivalents
> > 
> >       Proof: Dynamics typically depend on a large number of files, and
> >              the failure of any of those files will bring the dynamic
> >              binary down. For example, bash depends on:
> >                     -- bash, libncurses.so.4 (symlink), libncurses.so.4.2, 
> >                        libdl.so.2 (symlink), libdl.so.2.1.2, libc.so.6
> >                        (symlink), libc-2.1.2.so, ld-linux.so.2, and 
> >                        until recently on libreadline as well (2 files)
> >              whereas sash depends on only one file (sash). Thus there
> >              are eight (previously 10) files upon which bash depends,
> >              but only one on which sash depends. 
> 
> But both the dynamic and static binary depend on exactly the same
> source and if thats broekn the bin is broken, no matter if its staic
> or dynamically linked. You gain or loose nothing. The chance that
> dynamic linking fails but static not is 0.

Of course I am talking about at runtime, because compile time is 
really totally irrelevant to the discussion. I assume you wrote this 
in a brief moment of distraction, because the rest of what you wrote
was fairly intelligent. 

However, just to drive the point home, try this:

    rm /lib/*

now check whether your statics still work, and whether anything 
else does. 

"The chance that dynamic linking fails but static not is 0" is an 
absolutely ridiculous statmeent--obviously written in a moment of 
distraction. It's not even close to the truth. 

> The only benefit you get from static bins is that all bins are rebuild 
> with each lib update and unresolved symbols will be detected
> automatically.

As if my C compiler would be working during a system failure! All I 
am talking about is enough functionality to copy a few already 
compiled libraries onto my limping system.

You expect me to have a functioning GCC!?

I think this is a continuation of the sleep you fell into, during which
you wrote this point and the previous one. Again, the stuff you said
before this bit was a lot more informed--it must be late at night :-)

> 
> >    Proposition #12-- The presence of static recovery tools will 
> >              not pose any difficulties to ordinary use of the system
> 
> You can´t boot any longer on lowmem systems. I think thats a
> difficulty.

I used to run a Linux that had all static binaries. That was a very long
time ago, dynamic linking wasn't working very well in Linux back then. 
I don't recall being unable to boot, and I only had 4M of memory in
my 386sx25. 

Secondly, the static recovery tools have nothing to do with the boot 
process and would have absolutely no effect on it. The first potential
opportunity for a static to run would be when you logged in as root,
and since the static shell (sash) uses less memory than bash does 
(even with dynamic linking, bash uses more per-process) this is 
a complete non-issue.

> >    ------------------------------------
> >    Conclusion: Debian should provide static binaries 
> 
> Yes, as an option. Go ahead and patch the packages to build static
> versions as well. I won´t use them for the one i a million chance
> where it could help but you may if you want.

Once again, I have argued repeatedly that this is not a good idea, since
many people who require the static binaries do not know that they require
them until their system fails. This is owing either to:

   1) Not enough experience. This shouldn't be held against them.

   2) Too much experience. Everyone knows all Unixes have static recovery.

I got burned by #2, it never occurred to me that Debian WOULDN'T 
install the statics until I need them, and then I was horrified to 
discover that it didn't. 

Now *I* know better, but there are many who don't, and who are going 
to wind up in the same boat as me (whether due to too much or due to 
too little experience).

Note that someone with too little experience still benefits from the 
statics. The most natural thing to do when the system fails for an 
easily diagnosed reason (the error message you get when dynamics fail
is pretty explicit) is to log in as root and try and fix the problem. 

With statics, their attempt to log in as root will work. Then they'll
have a chance to see whether they've learned enough Unix to get a  
new library in place--maybe they will, or maybe they'll call for 
help (and find someone who does, and who will use the statics to 
do the job). 

Your statement above has been repeated many times by many people, but
never has anyone yet answered me how someone is supposed to know that
they are going to need the statics, and that they aren't there.


> > I think I've more or less dealt with this above; though if you are 
> > going to state your experience, I may as well state mine: statics 
> > have been required to fix Debian systems a few times (and a couple
> > times required a reboot since I didn't have them installed--I just
> > expected they would be there already!). These were desktop machines,
> > but no matter--the experience of needing statics was the same. 
> 
> Did you have a set of rescue bins and libs installed in a seperate
> directory or did you use a chroot enviroment to test first? No, than
> its your own carelesness.

The first time I didn't, because I assumed that Debian installed statics
by default, the way every other decent Unix system I've ever used does. 
(Yes, there are some indecent ones, I assumed Debian was not among them.)

After that I've always had the static shell in place as the root 
shell, and the recovery tools provided by it have been adequate 
(and useful).

> Dynamic bins are neccessary on lowmem systems and are far better on
> normal system. Whether static bins help on high availability systems
> is the admins decision.

In fact, dynamic bins are frequently better on low memory systems (for
example, 'ash' and 'sash' both consume far less memory than 'bash', 
even when ash/sash are static and bash is dynamic). 

And secondly, the static bins won't normally run and will therefore 
have zero impact on the system. That was mentioned in my propositions.

> >     #1 -- People who are running important servers often don't know
> >           that statics are important (we live in such a world; even
> >           when that level of experience is present it's frequently
> >           junior people who do installations)
> 
> Then write adequate documentation. Change dinstall to query what type
> the user wants or make those packages the default for server configs.

Writing adequate documentation is in no way a solution. There is too
much Unix documentation already, and someone who doesn't know that 
statics are important will already have a stack of material to read
that is much bigger than their computer. 

It's also a total cop-out. There is a solution available that has a
negligible cost, and which will provide substantial benefit. Installing
the static rescue tools is the SAFER, and MORE RELIABLE course of 
action. Not installing them is the UNSAFE, and LESS RELIABLE course
of action. 

The default should be safe and reliable. 

Your second suggestions is a truly useful one, and that may just be
the solution: Some informative question during the installation could 
ask the user whether they want statics or not, accompanied by a 
paragraph explaining why they would, and why they wouldn't.

In that case, there would be no users who didn't know they needed 
it, because they would read that question and know that they did.

I would accept that as a solution, providing the default answer to
the question was "YES, I want static recovery tools", since as I 
said, the default should be the safer and more reliable system.


> >     #2 -- The debian policy states that anything an experienced Unix
> > 	  person expects should be provided by default, and it has
> > 	  been said on this list by others several times that it's
> > 	  obvious many experienced Unix admins expect to be able
> > 	  to recover from a failure of the C library
> 
> Any bugs in any library will also show up in the static bins, so you
> gain nothing by static linking. Any update to a broken lib will also
> install broken static bins.

Which is why I keep harping so much about bootstrapping. Your static 
tools should test themselves out before replacing themselves. So 
should your package manager. And if these things are all static, 
this is so much easier to do (don't have to test again just 
because you install a C library, and fewer dependencies to watch).

I feel you are confusing runtime and compile time dependencies again
here as well, though, since you say "gain nothing" when clearly 
the compile issue is one of the LEAST likely ways your library will 
fail. Every Debian user in the world would catch that immediately, 
so the probability is extremely remote that it would happen to you. 
More likely you will get burned by something subtle, like the 
bash/readline situation that existing potato users didn't notice 
(it only affected slink->potato upgrades). Yes that was in the 
unstable version, please don't mention that: the point here is 
that any likely software failure will be fairly subtle than
"library doesn't compile at all".

> > First, and especially since the cost is so low, it is wise to
> > include the tools that an admin will need to recover from failure
> > by default, because many such admins are effectively learning as
> > they go, and when the server fails, it is too late for them to
> > learn that statics were important.
> > 
> > Second, your position is in violation of debian policy.
> 
> A rescue disk is provided, more is not needed. Maybe more is possible, 
> bot not required.

You see, this is what bugs me. You just read a very long message
which presented a long and detailed argument against the statement
you just wrote. Nevertheless, you feel you can state this opinion
without backing it up.

A rescue disk, as I have exhaustively demonstrated, is an absolutely
unacceptable rescue solution on two very large classes of servers:
those where a reboot is a logistical problem; and those where a 
reboot is flat out unacceptable because of critical services.

If you are going to persist in stating unsubstantiated opinions
like that... well it's very frustrating for me.

Please justify this statment. I enjoy technical debates, I am not
frustrated when someone presents me with facts that disagree with
my views: I find that challenging, and it gives me something to 
think about. 

But I really get upset when people ignore pages and pages of detailed
analysis and post the contrary without even somuch as a "because".


> Why should that ever happen? Is it more likely for dpkg to remove libc 
> than sash? Is /bin save from errors more than /lib is? Remember, when
> you use static binaries you will have /lib in /bin several
> times.

1) Yes. The dynamics have far more complex and numerous runtime
   dependencies, and that makes the probability for error much greater

2) If it removed sash but not the dynamics, you can fix it! But if   
   it removes the dynamics, and you have no sash, you have to reboot.
   
> Downtime of an NFS server is hardly any problem. If you get it up
> withing 15 minutes it just starts working again. No problem there even 
> with root-NFS. I tested that several times so far.

Maybe for YOU downtime of an NFS server is not a problem!!!

Yes, it will "just start working again" in 15 minutes. After you've 
stopped everyone in your company from doing any work at all for 
the entire duration of your reboot and downtime, then within 
fifteen minutes you will get somewhere. 

Let's say you are using Debian as the file server for a small 
company of only 100 people or so, and you are so speedy that 
you reboot, load your rescue disk, fix up your libraries, reboot
again, and restart your server all in 15 minutes. And then your
computers are all really fast so it only takes them 2 mintues 
to get going after that. 

Now let's say that at that time, 50% of the people in your company
were sleeping, talking to co-workers, using the phone, playing 
games, having sex in their offices, and otherwise being unproductive 
(or at least not using their computers). 

But the other half were using their computers, doing work, trying 
to make important deadlines, or whatever--and they got stopped
completely for 17 minutes each while you tinkered with your system.
That little goof just cost you two person-days of work: if you had
statics installed, you could have fixed the problem in a few minutes
without anyone noticing.

Let's evaluate that: with statics installed, you could have fixed
the problem, and then gone on vacation for two days and it would
have been cheaper and less disruptive!!!

So, I repeat, maybe for YOU downtime of an NFS server is not a problem!

> > Maybe the problem is that I didn't make this clear enough--I think
> > that a static root shell is extremely important, and will not run 
> > Debian as a production server until it's available by default (or 
> > someone convinces me why I don't need it). 
> 
> It will probably never be available by default. Its normaly not needed 
> and its not usual. But the option is there, use it at your will.

It isn't normally needed, that's true. But when it's needed, it's too
late to decide to install it. Furthermore, it is usual--every other 
half-decent OS has some kind of static recovery tools available. 

It's so usual that I didn't even think to look whether Debian installed
them when I first set it up--obviously it would, right? After all, 
Debian is an OS that values its reputation for quality. (Sorry for 
the hyperbole, but hey it's 6am, I'm entitled to it.)

Justin


Reply to: