Re: LVM write performance
On 08/09/2011 07:13 PM, Stan Hoeppner wrote:
> On 8/9/2011 9:12 AM, Dion Kant wrote:
>
>> Thanks for your remarks. The disk info is given below. Writing to the
>> disk is oke when mounted, so I think it is not a hardware/alignment
>> issue. However your remarks made me do some additional investigations:
>>
>> 1. dd of=/dev/sdb4 if=/dev/zero gives similar results, so it has nothing
>> to do with LVM;
>> 2. My statement about writing like this on an openSUSE kernel is wrong.
>> Also with openSUSE and the same hardware I get similar (slow) results
>> when writing to the disk using dd via the device file.
>>
>> So now the issue has diverted to the asymmetric behaviour when
>> writing/reading using dd directly through the (block) device file.
>>
>> Reading with dd if=/dev/sdb4 of=/dev/null gives disk limited performance
>> Writing with dd of=/dev/sdb4 if=/dev/zero gives about a factor 10 less
>> performance.
> Run:
> /$ dd of=/dev/sdb4 if=/dev/zero bs=4096 count=500000
>
> Then run again with bs=512 count=2000000
>
> That will write 2GB in 4KB blocks and will prevent dd from trying to
> buffer everything before writing it. You don't break out of this--it
> finishes on it's own due to 'count'. The second run with use a block
> size of 512B, which is the native sector size of the Seagate disk.
> Either of these should improve your actual dd performance dramatically.
>
> When you don't specify a block size with dd, dd attempts to "buffer" the
> entire input stream, or huge portions of it, into memory before writing
> it out. If you look at RAM, swap usage, and disk IO while running your
> 'raw' dd test, you'll likely see both memory, and IO to the swap device,
> are saturated, with little actual data being written to the target disk
> partition.
>
> I attempted to nudge you into finding this information on your own, but
> you apparently did not. I explained all of this not long ago, either
> here or on the linux-raid list. It should be in Google somewhere.
> Never use dd without specifying the proper block size of the target
> device--never. For a Linux filesystem this will be 4096 and for a raw
> hard disk device it will be 512, optimally anyway. Other values may
> give better performance, depending on the system, the disk controller,
> and device driver, etc.
>
> That Seagate isn't an AF model so sector alignment isn't the issue here,
> just improper use of dd.right.
>
Stan,
You are right, with bs=4096 the write performance improves
significantly. From the man page of dd I concluded that not specifying
bs selects ibs=512 and obs=512. A bs=512 gives indeed similar
performance as not specifying bs at all.
When observing the system with vmstat I see the same (strange) behaviour
for no bs specified, or bs=512:
root@dom0-2:~# vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
1 0 0 6314620 125988 91612 0 0 0 3 5 5 0 0
100 0
1 1 0 6265404 173744 91444 0 0 23868 13 18020 12290 0
0 86 14
2 1 0 6214576 223076 91704 0 0 24666 1 18596 12417 0
0 90 10
0 1 0 6163004 273172 91448 0 0 25046 0 18867 12614 0
0 89 11
1 0 0 6111308 323252 91592 0 0 25042 0 18861 12608 0
0 92 8
0 1 0 6059860 373220 91648 0 0 24984 0 18821 12578 0
0 85 14
0 1 0 6008164 423304 91508 0 0 25040 0 18863 12611 0
0 95 5
2 1 0 5956344 473468 91604 0 0 25084 0 18953 12630 0
0 95 5
0 1 0 5904896 523548 91532 0 0 25038 0 18867 12607 0
0 87 13
0 1 0 5896068 528680 91520 0 0 2558 99597 2431 1373 0 0
92 8
0 2 0 5896088 528688 91520 0 0 0 73736 535 100 0 0
86 13
0 1 0 5896128 528688 91520 0 0 0 73729 545 99 0 0
88 12
1 0 0 6413920 28712 91612 0 0 54 2996 634 372 0 0
95 4
0 0 0 6413940 28712 91520 0 0 0 0 78 80 0 0
100 0
0 0 0 6413940 28712 91520 0 0 0 0 94 97 0 0
100 0
Remarkable behaviour in the sense that there is a lot of bi in the
beginning and finally I see bo at 75 MB/s.
With obs=4096 it looks like
root@dom0-2:~# vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
1 0 0 6413600 28744 91540 0 0 0 3 5 5 0 0
100 0
1 0 0 6413724 28744 91540 0 0 0 0 103 96 0 0
100 0
1 0 0 6121616 312880 91208 0 0 0 18 457 133 1 2
97 0
0 1 0 5895588 528756 91540 0 0 0 83216 587 88 1 3
90 6
0 1 0 5895456 528756 91540 0 0 0 73728 539 98 0 0
92 8
0 3 0 5895400 528760 91536 0 0 0 73735 535 93 0 0
86 14
1 0 0 6413520 28788 91436 0 0 54 19359 783 376 0 0
93 6
0 0 0 6413544 28788 91540 0 0 0 2 100 84 0 0
100 0
0 0 0 6413544 28788 91540 0 0 0 0 86 87 0 0
100 0
0 0 0 6413552 28796 91532 0 0 0 10 110 113 0 0
100 0
As soon as I select a bs which is not a whole multiple of 4096, I get a
lot of block input and a bad performance for writing data to disk.
I'll try to Google your mentioned thread(s) on this. I still feel not
very satisfied with your explanation though.
Thanks so far,
Dion
Reply to: