[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: to lvm or not to lvm?



On Thu, 2007-05-03 at 10:36 +0800, Bob wrote:
> Greg Folkert wrote:
> > On Tue, 2007-05-01 at 10:39 +0800, Bob wrote:
> >> Andrew Sackville-West wrote:
> >> 8< snip lots of automatically growing partitions using LVM stuff
> >>> Why are you after this complexity of automatically growing partitions?
> >>> disk space is cheap. recovering from problematic fs resizes is NOT. I
> >>> understand the idea of tuning your partition sizes so that you can
> >>> have the optimal size and this is very doable with LVM, but if you've
> >>> got portions of your directory tree that might grow really big, then
> >>> just give them the space and be done with it. It you later determine
> >>> that you don't need it all, you can adjust then with LVM with relative
> >>> ease. 
> >> I'm trying to think of a way an admin coming for the Windows world, or 
> >> from a home server world, where they've had the convenience of not 
> >> having to think about these things and you've either got harddrive space 
> >> or you don't, can have a that convenience while retaining the extra 
> >> security of having lots of partitions with different functions that 
> >> can't steal space from each other.  I don't know how common it is for a 
> >> live fs resize to go south, if it's a statistically significant 
> >> percentage then obviously it wouldn't be worth the risk, if not I still 
> >> think it's a good idea.
> >
> > Think about this then. LPARs on an IBM machine. LPAR allows you to
> > "redistribute" processors and memory to logical machines. The hypervisor
> > gives you the ability to "add processing power" to a logical machine or
> > "add memory" or to remove them.
> >
> > This give you flexibility to control your "domains" and flow the
> > resources depending on your needs. Then think about doing LVM with those
> > guys. Sure does throw a monkey wrench in thinking about that.
> <fx_Otto> Wow man that's trippy </fx Otto>
> 
> > As far as resizing, AIX has been doing it for years, both resizing
> > larger and resizing smaller. Though the "growing" is much more tested,
> > the shrinking does works well.
> >   
> 
> As long as it's reliable
> 8< snip

AIX is far more reliable than most even begin to understand. I've had a
couple of machines on double firewalled (from the users, them from the
internet) up for 3 years, and adding another processor cabinet, memory
cabinet, storage cabinet(s with drawers full of disk), Multi-path SAN
connections and tape subsystems during that time.

> >>> that said, you may of course do whatever you like to your system. And
> >>> the idea sounds cool on the face of it. I just think you're asking for
> >>> trouble and unneeded complexity. 
> >>>       
> >> I agree simplicity=good complexity=bad, but sometimes it's worth adding 
> >> two measures of complexity to an automated system in order to remove one 
> >> from user, or in this case admin, space, particularly if you can do it 
> >> in a relatively easy to understand way, such as with well documented 
> >> shell scripts, getting called by a disk space monitoring daemon.
> >>     
> >
> > Auto adding of space has been done. Trust me on this one. Don't do it.
> > I've seen the after effects. A typo in a script can do more harm in 20
> > seconds (or less) than any one person could do in 20 years when dealing
> > with disk space. You would be well advised to setup an existing projects
> > disk space monitoring system and have it urgently mail you that disk
> > space is becoming a premium commodity on /blah filesystem.
> >   
> 
> As long as it's not possible for a growing partition to take space away 
> from another partition, I can't see how this would do anything but give 
> you more time to react, you run your messed up script that starts 
> pushing the contents of /dev/zero into a text file and eating harddrive 
> space, if your sitting next to the machine you see the HDD lights come 
> on and hear the seeking, if your remote the first clue you get is an 
> email or SMS saying /blah has been automatically grown by 5GB because it 
> was more than 85% full, [0] you can kill your script, delete the text 
> file, reduce the partition again and nothing crashed, if all it had done 
> was email you a message pointing out /blah was running out of space, and 
> /blah was also required by some other vital process that ran out of 
> space and crashed before you could kill the script, you'd wish you had 
> auto growth.

No, you don't get the fact that saturation of IO is a bad thing. Even on
8 multi-path IO sub-systems. Sometimes things spin and spin and spin.
When they do... it gets icky. 

To give you an example:

        I was managing a n-tier (2-tier and 3 to 5 tier) setup. We had
        18 Terabytes of spare mirrored disk allocated to this machine.
        We had only 8 TB of mirrored disk already for DATA. One day the
        Oracle DB starts CRANKING and CRANKING on transaction logs. But
        its not really doing any work. The Oracle DBA had disk
        allocation rights from the spare 18TB of disk. So, he made a
        minutely cronjob, checking for percentage of space free. If the
        LV and Filesystem didn't have X% free add another X amount of
        space.
        
        Unfortuneately, at 8TB, 1% of space is ~85GB. Problem was, that
        he was adding 32GB chunks and extending the filesystem by 32GB
        at a time. Once a minute. The LV extending commands were never
        completing let alone the File System extending. You can guess
        how well that went.
        
        In about 11 hours, all 18TB was allocated but not used and was
        not committed but wasn't recoverable until it completed the
        commit process. And the Filesystem was never extended properly.
        We had to switch over to a fail-over machine and pray the DB was
        good on the hot-copy. It was... so not much was really bad.

So, you see, it is all about understanding scale. When you start talking
about TeraBytes or PetaBytes you need to have completion checks before
the next step or a repeat. And do things in large enough "chunks" to
make a difference.

> I think the best way to implement it would be to set limits on how much 
> a partition can grow, so once /blah has eaten 90% of the remaining 
> space, it's not allowed to grow any more and the processes responsible 
> for the expansion, along with any others that require space on /blah, 
> crash hopefully not taking the whole system with it but still leaving a 
> good chunk of space for other vital partitions to grow into, should they 
> need to, so when you get off your train, plain or hobby horse, hopefully 
> you can still ssh in and fix things.

The machine I mentioned was SO BUSY, ssh took over 10 minutes to get a
login and then they key-auth expired before the login process was
finished. Even the console was that slow, but at least it didn't time
out.
-- 
greg, greg@gregfolkert.net

Novell's Directory Services is a competitive product to Microsoft's
Active Directory in much the same way that the Saturn V is a competitive
product to those dinky little model rockets that kids light off down at
the playfield. -- Thane Walkup

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: