[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Uptime (Was: Re: [OT, FLAME] Linux Sucks)



Jamie Lawrence said:

> I hate to join in the flamefest, but if that's the case, you need
> to find a different admin for those servers.
>
> If a professional admin can't ensure the server they run is going to boot,
> they need some remedial help, some process control, or a good beating.
> Sure, mistakes happen, but the methodology of change control for *nix
> machines isn't that hard.


(pouring gas on the fire)

I guess you haven't run any machines that have extremely long uptimes?

I have read countless stories of machines with ultra long uptimes not
even booting after they are restarted.

if a machine runs and runs and runs it usually doesn't get restarted
on a regular basis(for me that means less then 2 reboots per year), during
this time a LOT of things can change on a system and when the machine
is restarted some things may not work. Or some hardware may fail in
the process. I powered down a ultra 10 for about 20 minutes to move
it(along with a buncha other stuff) to another UPS, turned it on and
it never booted up, the disk was frozen solid, the heads were completely
fried and the machine was only 6 months old. Just this past monday
someone from my former employer called me up asking me some questions,
nobody had setup a init script to load stunnel on boot for the mysql
database on the system. It's one of those things where you say "i'll
get to it later" but later never happens. So I told him the command
line to get it up, I bet he hasn't even setup an init script for it.

The problem can get worse on the more reliable systems, that are up
for several years at once.

the usual problems are just some software doesn't load on boot, it's
more rare in my experience to have hardware fail in the space of
restarting a server.

which is usually why in the unix/linux world rebooting is typically
an absolutely last resort, I cannot remember ever seeing a problem
corrected by rebooting a linux/unix server with the exception of
a severe hardware error(e.g. video framebuffer freezes up because
something X went whacky), but even in those cases the system was
still completely usable via network. Whenever I reboot my systems
with the exception of the couple that get rebooted often(laptop and
my sister's desktop), I cringe and cross my fingers that it will
come back up ok(~98% of the time they come up fine). To some extent
I feel the same way about the routers that I ran, always was sure
to write the config to flash before rebooting them just incase.

which is one reason I got upset at freebsd this past week, upgraded
from 4.7 to 4.8 and was shocked that I had to upgrade the kernel
AND reboot the box before ps would work again. By contrast most of
my production linux servers are running linux 2.2.19(released
march 25 2001), and have gone through countless upgrades of system
software without rebooting(and no my systems are not vulnerable
to the recent ptrace bug).

a few months back(ok maybe 8-9) I started a thread talking about
linux wrapping the uptime at 497 days ..and someone mentioned
to me that if the system was shut down the disks may not spin up
again. So I waited..the machine continued to run ...until dec 27 12:20PM
the power went out, battery backup lasted about 45 minutes and I had
to shut the machine down, after 634 days of uptime. The power was
out for nearly 10 hours, I powered it up and *gasp* it came up! woohoo!
so I immediately upgraded the machine to show my gratitude(quadrupled
the memory and changed from IDE to SCSI). The 2 disks in the system
during those 634 days were refurb maxtor drives..was shocked it lasted
so long(web/ftp/imap/pop3/smtp/ssh/X11 running xawtv 24/7) without a
problem.

and I'm not alone I'm sure. I know tons of system/network admins
and have never known one who ran reliable servers to not have
a case when they didn't make an init script for something they wanted
to load on boot before they restarted a machine.

nate





Reply to: