On Tue, Sep 13, 2011 at 03:15:13PM -0700, Bryan Irvine wrote: > Which brings me to another fun question. What's your worst > administration mistake and how did you recover? My worst administration mistake was rebooting a rack in our production data center. I thought I had typed a specific IP address to get to a specific rack, but fat-fingered one of the numbers in the IP, and it send me to our production rack. My job was to setup the hard drives with software RAID, and put LVM on them. THere were plenty of opportunities the system was giving me that should have warned me that I was on the wrong rack, but I continued anyway. Getting frustrated that I was seeing more devices than expected, I issued a reboot on most of the servers in that rack. Because those servers were part of a clustered filesystem, and running many virtual machines, a lot of our infrastructure went down, and we were down for about 3 hours. Needless to say, it was a valuable lesson, one I'll never forget. In fact, it prompted me to use LocalCommand in my ~/.ssh/config, and echo colored prompts, depending on whether or not I'm on a production (blinking bold red), staging (bold yellow) ordevelopment (bold green) server. -- . o . o . o . . o o . . . o . . . o . o o o . o . o o . . o o o o . o . . o o o o . o o o
Attachment:
signature.asc
Description: Digital signature