[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Worst Admin Mistake? was --> Re: /usr broken, will the machine reboot ?



I had a case where it had snowed, and instead of driving 50 miles in snow and ice with dodgy DC drivers, I'd work from home. Had my laptop, was doing work. Well, they scheduled a meeting for that afternoon (at about lunch time), so I got ready and headed in to the office. I typed halt in a window on my machine, and went to get my stuff together. Came back a few minutes later and found the laptop was still up. Had inadvertantly (I blame focus-follows-mouse) shut down a remote box, our production webserver...

As for non-Linux, I found out that the GNU version of killall doesn't take arguments, it does just that...Everything but the halt. Was on an AIX box, and needed to kill several licensing servers, so I typed "killall <processname>" After about 5 minutes, lost contact with the box, because it had killed all processes. Since then, I always prefer pkill...

--b

On Wed, Sep 14, 2011 at 12:24 PM, Bryan Irvine <sparctacus@gmail.com> wrote:
On Wed, Sep 14, 2011 at 7:02 AM, Aaron Toponce <aaron.toponce@gmail.com> wrote:
> On Tue, Sep 13, 2011 at 03:15:13PM -0700, Bryan Irvine wrote:
>> Which brings me to another fun question.  What's your worst
>> administration mistake and how did you recover?
>
> My worst administration mistake was rebooting a rack in our production data
> center. I thought I had typed a specific IP address to get to a specific
> rack, but fat-fingered one of the numbers in the IP, and it send me to our
> production rack.
>
> My job was to setup the hard drives with software RAID, and put LVM on
> them. THere were plenty of opportunities the system was giving me that
> should have warned me that I was on the wrong rack, but I continued anyway.
>
> Getting frustrated that I was seeing more devices than expected, I issued a
> reboot on most of the servers in that rack. Because those servers were part
> of a clustered filesystem, and running many virtual machines, a lot of our
> infrastructure went down, and we were down for about 3 hours.
>
> Needless to say, it was a valuable lesson, one I'll never forget. In fact,
> it prompted me to use LocalCommand in my ~/.ssh/config, and echo colored
> prompts, depending on whether or not I'm on a production (blinking bold red),
> staging (bold yellow) ordevelopment (bold green) server.

Now THAT is genius!  I'm going to have to do that. :-)


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/CAG367gb8Y_kA0J7ZNtxqkRRJLpHko1u62CO3s9d8Bf+cp_q1g@mail.gmail.com



Reply to: