Re: to lvm or not to lvm?

To: debian-user@lists.debian.org
Subject: Re: to lvm or not to lvm?
From: Bob <spam@homeurl.co.uk>
Date: Thu, 03 May 2007 14:04:24 +0800
Message-id: <[🔎] 46397B68.4050207@homeurl.co.uk>
In-reply-to: <[🔎] 1178170304.11397.26.camel@princess.gregfolkert.net>
References: <1177083457.3090.15.camel@dogen.gateway.2wire.net> <20070420234221.GB10498@titan> <1177114289.3090.59.camel@dogen.gateway.2wire.net> <4632B04B.80002@homeurl.co.uk> <jwv647ee7se.fsf-monnier+linux.debian.user@gnu.org> <4635EF5A.3090002@homeurl.co.uk> <20070430165819.GO24895@localhost.localdomain> <[🔎] 4636A84E.60507@homeurl.co.uk> <[🔎] 1177993014.31746.13.camel@princess.gregfolkert.net> <[🔎] 46394AAC.7090502@homeurl.co.uk> <[🔎] 1178170304.11397.26.camel@princess.gregfolkert.net>

Greg Folkert wrote:

On Thu, 2007-05-03 at 10:36 +0800, Bob wrote:

Greg Folkert wrote:

8< massive snip

Auto adding of space has been done. Trust me on this one. Don't do it.
I've seen the after effects. A typo in a script can do more harm in 20
seconds (or less) than any one person could do in 20 years when dealing
with disk space. You would be well advised to setup an existing projects
disk space monitoring system and have it urgently mail you that disk
space is becoming a premium commodity on /blah filesystem.
As long as it's not possible for a growing partition to take space awayfrom another partition, I can't see how this would do anything but giveyou more time to react, you run your messed up script that startspushing the contents of /dev/zero into a text file and eating harddrivespace, if your sitting next to the machine you see the HDD lights comeon and hear the seeking, if your remote the first clue you get is anemail or SMS saying /blah has been automatically grown by 5GB because itwas more than 85% full, [0] you can kill your script, delete the textfile, reduce the partition again and nothing crashed, if all it had donewas email you a message pointing out /blah was running out of space, and/blah was also required by some other vital process that ran out ofspace and crashed before you could kill the script, you'd wish you hadauto growth.
No, you don't get the fact that saturation of IO is a bad thing. Even on
8 multi-path IO sub-systems. Sometimes things spin and spin and spin.
When they do... it gets icky.
To give you an example:

        I was managing a n-tier (2-tier and 3 to 5 tier) setup. We had
        18 Terabytes of spare mirrored disk allocated to this machine.
        We had only 8 TB of mirrored disk already for DATA. One day the
        Oracle DB starts CRANKING and CRANKING on transaction logs. But
        its not really doing any work. The Oracle DBA had disk
        allocation rights from the spare 18TB of disk. So, he made a
        minutely cronjob, checking for percentage of space free. If the
        LV and Filesystem didn't have X% free add another X amount of
        space.
Unfortuneately, at 8TB, 1% of space is ~85GB. Problem was, that
        he was adding 32GB chunks and extending the filesystem by 32GB
        at a time. Once a minute. The LV extending commands were never
        completing let alone the File System extending. You can guess
        how well that went.
In about 11 hours, all 18TB was allocated but not used and was
        not committed but wasn't recoverable until it completed the
        commit process. And the Filesystem was never extended properly.
        We had to switch over to a fail-over machine and pray the DB was
        good on the hot-copy. It was... so not much was really bad.

So, you see, it is all about understanding scale. When you start talking
about TeraBytes or PetaBytes you need to have completion checks before
the next step or a repeat. And do things in large enough "chunks" to
make a difference.

That sounds like poor implementation, the critical bit is not to let ititerate without it having worked the first time. I still don't seeanything wrong with the concept as long as it's implemented right,although I've never dealt with a system on the scale you're talking about.

I think the best way to implement it would be to set limits on how mucha partition can grow, so once /blah has eaten 90% of the remainingspace, it's not allowed to grow any more and the processes responsiblefor the expansion, along with any others that require space on /blah,crash hopefully not taking the whole system with it but still leaving agood chunk of space for other vital partitions to grow into, should theyneed to, so when you get off your train, plain or hobby horse, hopefullyyou can still ssh in and fix things.
The machine I mentioned was SO BUSY, ssh took over 10 minutes to get a
login and then they key-auth expired before the login process was
finished. Even the console was that slow, but at least it didn't time
out.

I've never seen anything that busy, had a MythTV backed that would sitwith the load average up at 3 or 5 and I though that was "getting mymoneys worth".


--
Garrr, do your bit for global warming, become a pirate, you can "borrow" my copy of Windows 95 if you want.

Reply to:

Follow-Ups:
- Re: to lvm or not to lvm?
  - From: Douglas Allan Tutty <dtutty@porchlight.ca>
- Re: to lvm or not to lvm?
  - From: Stefan Monnier <monnier@iro.umontreal.ca>

References:
- Re: to lvm or not to lvm?
  - From: Bob <spam@homeurl.co.uk>
- Re: to lvm or not to lvm?
  - From: Greg Folkert <greg@gregfolkert.net>
- Re: to lvm or not to lvm?
  - From: Bob <spam@homeurl.co.uk>
- Re: to lvm or not to lvm?
  - From: Greg Folkert <greg@gregfolkert.net>

Prev by Date: Re: problums gtng speeling chucker to wrok wit AbiWord
Next by Date: Re: same problem
Previous by thread: Re: to lvm or not to lvm?
Next by thread: Re: to lvm or not to lvm?
Index(es):
- Date
- Thread