[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: I/O performance issues on 2.4.23 SMP system



> >>I was the poster who initiated the previous thread on this subject.  The
> >>problem disappeared here after we went down to 2 GB of memory (although
> >>we physically removed it from the server rather than passing the arg to
> >>the kernel... shouldn't make a difference though, I'd imagine).  We went
> >>straight from 4 GB to 2 GB, so I can't comment on the results of using 3
> >>GB.

The above comment sounds a lot like a bounce buffer issue. This is not an IO issue.

Bounce Buffer issues look a like like IO problems on the surface. However, the IO
bus will get a messy from having to much memory feeding it. Bounce Buffer issues can occur
anytime you use over 2GB of RAM on a 32bit system.

I have a Dual SMP Xeon 700 (32 bit) with 10GB of RAM in it. 
It is under a 10-20% CPU load daily.

Originally, I had a bounce buffer problem that occurred during backups and heavy IO loads.
The output from sar, system activity report, told me that process switches were not recovering 
after backups. IO loads would 'snowball' after backups.

Generally, the whole system seemed to get overwhelmed and unstable after a heavy 
IO event, like a backup. I found this strange.

Since the patch has been applied the server has been running very stable for over 43 days.

I fixed the problem with following:

This Bounce Buffer problem was resolved with the 00_block-highmem-all-18b-3 patch.
        http://www.kernel.org/pub/linux/kernel/people/andrea

For example, the following sar output shows a normal recovery after a heavy IO event:

22:30:01  all    8    3     0    89     130    2    172  0.35  0.33  0.35
            0    7    4     0    90
            1    8    3     0    88

-> backup started #rsync 100GB RAID 5 Array

23:40:01  all    3   14     1    82    6173    1    166  1.44  1.46  1.52

00:00:02  cpu %usr %sys %nice %idle pswch/s runq nrproc lavg1 lavg5 avg15 _cpu_
00:10:01  all    3   14     1    82    5679    3    166  1.62  1.56  1.53
            0    3   13     1    83
            1    3   15     1    81
00:20:01  all    4   14     1    82    6068    3    156  1.45  1.46  1.46
            0    3   14     1    82
            1    4   14     1    81
00:30:01  all    2   13     1    83    5585    5    161  1.10  1.16  1.29
            0    3   13     1    84
            1    2   14     1    83
00:40:01  all    3    8     1    88    3191    2    146  0.12  0.63  1.01
            0    3    8     1    88
            1    3    8     1    88
00:50:01  all    3    3     0    95      86    3    139  0.15  0.23  0.60
-> sync finished

If you sar output does not look like this after a backup, and you have more than 2GB of RAM 
something is probably going on with a buffer. You can fix it two ways, upgrade to a 64Bit machine or
patch your kernel with the block-highmem patch written by Andrea.

My Kernel: 2.4.18

image=/boot/vmlinuz-2.4.18
        #Compiled using GCC-2.95 on new IMAP server
        #Debian 2.4.18 Kernel package
        #Debian 2.4.18 xfs kernel patch
        #block-highmemory patch from http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/
        #00_block-highmem-all-18b-3
        #HIMEM Kernel Support to 64GB
        #HIMEME IO Support added
        label=LinuxHIMEM
        read-only

My Hardware:
00:00.0 Host bridge: ServerWorks CNB20HE Host Bridge (rev 21)
00:00.1 Host bridge: ServerWorks CNB20HE Host Bridge (rev 01)
00:00.2 Host bridge: ServerWorks: Unknown device 0006
00:00.3 Host bridge: ServerWorks: Unknown device 0006
00:01.0 SCSI storage controller: Adaptec 7896
00:01.1 SCSI storage controller: Adaptec 7896
00:05.0 Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet LANCE] (rev 44)
00:06.0 VGA compatible controller: S3 Inc. Trio 64 3D (rev 01)
00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 4f)
00:0f.1 IDE interface: ServerWorks OSB4 IDE Controller
00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 04)
01:01.0 RAID bus controller: IBM Netfinity ServeRAID controller
01:02.0 RAID bus controller: IBM Netfinity ServeRAID controller
02:06.0 Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (rev 0c)


On 03/02/04 13:25 -0600, Benjamin Sherman wrote:
> Thanks to all who sent comments on this. I did some more testing and 
> went straight to the source for input.
> 
> <snip>
> if you want to try the 4G patch then i'd suggest Andrew Morton's -mm 
> tree, which has it included:
> 
> http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.2-rc2/2.6.2-rc2-mm2/
> 
> i've got a 2.4 backport too, included in RHEL3. (the SRPM is
> downloadable.) But extracting the patch from this srpm will likely not
> apply to a vanilla 2.4 tree - there are lots of other patches as well 
> and interdependencies. So i'd suggest the RHEL3 kernel as-is, or the -mm 
> tree in 2.6.
> 
> Ingo
> </snip>
> 
> Of course, as newer kernels are released, Andrew releases newer -mm 
> patches. This patch set solved the I/O problem and let me use 4GB RAM.
> 
> 
> 
> Mark Ferlatte wrote:
> 
> >Daniel Erat said on Thu, Jan 29, 2004 at 08:08:49AM -0800:
> >
> >>I was the poster who initiated the previous thread on this subject.  The
> >>problem disappeared here after we went down to 2 GB of memory (although
> >>we physically removed it from the server rather than passing the arg to
> >>the kernel... shouldn't make a difference though, I'd imagine).  We went
> >>straight from 4 GB to 2 GB, so I can't comment on the results of using 3
> >>GB.
> >>
> >>Our problem didn't seem to directly correspond with the 1 GB threshold
> >>-- it wouldn't manifest itself until the server had allocated all 4 GB
> >>of RAM.  After a reboot, it would be nice and speedy again for a day or
> >>two until all the memory was being used for buffering again.
> >
> >
> >This was the behavior I saw as well.  I did a bunch of research and source
> >reading before actually figuring out what was going on; it wasn't a well
> >documented bug for some reason... I guess there aren't that many people 
> >running
> >large boxes using 2.4.
> >
> >This makes me think that the problems I saw with 2GB were not related to 
> >the IO
> >subsystem, but were something else.  Time to go play around a bit; getting
> >those boxes up to 2GB without having to do a kernel patch/upgrade cycle 
> >would
> >be nice.
> >
> >M
> 
> -- 
> Benjamin Sherman
> Software Developer
> Iowa Interactive, Inc
> 515-323-3468 x14
> benjamin@iowai.org
> 
> 
> -- 
> To UNSUBSCRIBE, email to debian-isp-request@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact 
> listmaster@lists.debian.org
> 

-- 
------------------------------------------
Ted Knab
Chester, MD 21619
------------------------------------------
35570707f6274702478656021626f6c6964796f6e602f66602478656
02e6164796f6e60237471647560216e6460276c6f62616c60257e696
4797e2a0



Reply to: