[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Worst Admin Mistake? was --> Re: /usr broken, will the machine reboot ?



On Tue, Sep 13, 2011 at 03:15:13PM -0700, Bryan Irvine wrote:
> Which brings me to another fun question.  What's your worst
> administration mistake and how did you recover?

My worst administration mistake was rebooting a rack in our production data
center. I thought I had typed a specific IP address to get to a specific
rack, but fat-fingered one of the numbers in the IP, and it send me to our
production rack.

My job was to setup the hard drives with software RAID, and put LVM on
them. THere were plenty of opportunities the system was giving me that
should have warned me that I was on the wrong rack, but I continued anyway.

Getting frustrated that I was seeing more devices than expected, I issued a
reboot on most of the servers in that rack. Because those servers were part
of a clustered filesystem, and running many virtual machines, a lot of our
infrastructure went down, and we were down for about 3 hours.

Needless to say, it was a valuable lesson, one I'll never forget. In fact,
it prompted me to use LocalCommand in my ~/.ssh/config, and echo colored
prompts, depending on whether or not I'm on a production (blinking bold red),
staging (bold yellow) ordevelopment (bold green) server.

--
. o .   o . o   . . o   o . .   . o .
. . o   . o o   o . o   . o o   . . o
o o o   . o .   . o o   o o .   o o o

Attachment: signature.asc
Description: Digital signature


Reply to: