[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Worst Admin Mistake? was --> Re: /usr broken, will the machine reboot ?



On Wed, Sep 14, 2011 at 7:02 AM, Aaron Toponce <aaron.toponce@gmail.com> wrote:
> On Tue, Sep 13, 2011 at 03:15:13PM -0700, Bryan Irvine wrote:
>> Which brings me to another fun question.  What's your worst
>> administration mistake and how did you recover?
>
> My worst administration mistake was rebooting a rack in our production data
> center. I thought I had typed a specific IP address to get to a specific
> rack, but fat-fingered one of the numbers in the IP, and it send me to our
> production rack.
>
> My job was to setup the hard drives with software RAID, and put LVM on
> them. THere were plenty of opportunities the system was giving me that
> should have warned me that I was on the wrong rack, but I continued anyway.
>
> Getting frustrated that I was seeing more devices than expected, I issued a
> reboot on most of the servers in that rack. Because those servers were part
> of a clustered filesystem, and running many virtual machines, a lot of our
> infrastructure went down, and we were down for about 3 hours.
>
> Needless to say, it was a valuable lesson, one I'll never forget. In fact,
> it prompted me to use LocalCommand in my ~/.ssh/config, and echo colored
> prompts, depending on whether or not I'm on a production (blinking bold red),
> staging (bold yellow) ordevelopment (bold green) server.

Now THAT is genius!  I'm going to have to do that. :-)


Reply to: