[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: LVM write performance



On 8/13/2011 6:53 AM, Dion Kant wrote:

> Stan,
> 
> You are right, with bs=4096 the write performance improves
> significantly. From the man page of dd I concluded that not specifying
> bs selects ibs=512 and obs=512. A bs=512 gives indeed similar
> performance as not specifying bs at all.
> 
> When observing the system with vmstat I see the same (strange) behaviour
> for no bs specified, or bs=512:
> 
> root@dom0-2:~# vmstat 2
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
> id wa
>  1  0      0 6314620 125988  91612    0    0     0     3    5    5  0  0
> 100  0
>  1  1      0 6265404 173744  91444    0    0 23868    13 18020 12290  0 
> 0 86 14
>  2  1      0 6214576 223076  91704    0    0 24666     1 18596 12417  0 
> 0 90 10
>  0  1      0 6163004 273172  91448    0    0 25046     0 18867 12614  0 
> 0 89 11
>  1  0      0 6111308 323252  91592    0    0 25042     0 18861 12608  0 
> 0 92  8
>  0  1      0 6059860 373220  91648    0    0 24984     0 18821 12578  0 
> 0 85 14
>  0  1      0 6008164 423304  91508    0    0 25040     0 18863 12611  0 
> 0 95  5
>  2  1      0 5956344 473468  91604    0    0 25084     0 18953 12630  0 
> 0 95  5
>  0  1      0 5904896 523548  91532    0    0 25038     0 18867 12607  0 
> 0 87 13
>  0  1      0 5896068 528680  91520    0    0  2558 99597 2431 1373  0  0
> 92  8
>  0  2      0 5896088 528688  91520    0    0     0 73736  535  100  0  0
> 86 13
>  0  1      0 5896128 528688  91520    0    0     0 73729  545   99  0  0
> 88 12
>  1  0      0 6413920  28712  91612    0    0    54  2996  634  372  0  0
> 95  4
>  0  0      0 6413940  28712  91520    0    0     0     0   78   80  0  0
> 100  0
>  0  0      0 6413940  28712  91520    0    0     0     0   94   97  0  0
> 100  0
> 
> Remarkable behaviour in the sense that there is a lot of bi in the
> beginning and finally I see bo at 75 MB/s.

That might be due to massive merges, but I'm not really a kernel hacker
so I can't say for sure.

> With obs=4096 it looks like
> 
> root@dom0-2:~# vmstat 2
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
> id wa
>  1  0      0 6413600  28744  91540    0    0     0     3    5    5  0  0
> 100  0
>  1  0      0 6413724  28744  91540    0    0     0     0  103   96  0  0
> 100  0
>  1  0      0 6121616 312880  91208    0    0     0    18  457  133  1  2
> 97  0
>  0  1      0 5895588 528756  91540    0    0     0 83216  587   88  1  3
> 90  6
>  0  1      0 5895456 528756  91540    0    0     0 73728  539   98  0  0
> 92  8
>  0  3      0 5895400 528760  91536    0    0     0 73735  535   93  0  0
> 86 14
>  1  0      0 6413520  28788  91436    0    0    54 19359  783  376  0  0
> 93  6
>  0  0      0 6413544  28788  91540    0    0     0     2  100   84  0  0
> 100  0
>  0  0      0 6413544  28788  91540    0    0     0     0   86   87  0  0
> 100  0
>  0  0      0 6413552  28796  91532    0    0     0    10  110  113  0  0
> 100  0
> 
> As soon as I select a bs which is not a whole multiple of 4096, I get a
> lot of block input and a bad performance for writing data to disk.

> I'll try to Google your mentioned thread(s) on this. I still feel not
> very satisfied with your explanation though.

My explanation to you wasn't fully correct.  I confused specifying no
block size with specifying an insanely large block size.  The other post
I was referring to dealt with people using a 1GB (or larger) block size
because it made the math easier for them when wanting to write a large
test file.

Instead of dividing their total file size by 4096 and using the result
for "bs=4096 count=X" (which is the proper method I described to you)
they were simply specifying, for example, "bs=2G count=1" to write a 2
GB test file.  Doing this causes the massive buffering I described, and
consequently, horrible performance, typically by a factor of 10 or more,
depending on the specific system.

The horrible performance with bs=512 is likely due to the LVM block size
being 4096, and forcing block writes that are 1/8th normal size, causing
lots of merging.  If you divide 120MB/s by 8 you get 15MB/s, which IIRC
from your original post, is approximately the write performance you were
seeing, which was 19MB/s.

If my explanation doesn't seem thorough enough that's because I'm not a
kernel expert.  I'm just have a little better than average knowledge/
understanding of some of aspects of the kernel.

If you want a really good explanation of the reasons behind this dd
block size behavior while writing to a raw LVM device, try posting to
lkml proper or one of the sub lists dealing with LVM and the block
layer.  Also, I'm sure some of the expert developers on the XFS list
could answer this as well, though it would be a little OT there, unless
of course your filesystem test yielding the 120MB/s was using XFS. ;)

-- 
Stan


Reply to: