[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

pixman: Changes to 'debian-unstable'



 .gitignore                            |   46 
 ChangeLog                             | 1955 +++++++++++++++++++++++
 configure.ac                          |    5 
 debian/changelog                      |   17 
 debian/control                        |    9 
 debian/patches/ppc64el.diff           |   14 
 debian/patches/series                 |    1 
 debian/rules                          |    5 
 pixman/Makefile.am                    |    2 
 pixman/pixman-arm-asm.h               |   37 
 pixman/pixman-arm-common.h            |   11 
 pixman/pixman-arm-neon-asm-bilinear.S |   12 
 pixman/pixman-arm-neon-asm.S          |   12 
 pixman/pixman-arm-neon-asm.h          |   20 
 pixman/pixman-arm-neon.c              |   24 
 pixman/pixman-arm-simd-asm-scaled.S   |   11 
 pixman/pixman-arm-simd-asm.S          |  525 ++++++
 pixman/pixman-arm-simd-asm.h          |  116 +
 pixman/pixman-arm-simd.c              |   44 
 pixman/pixman-combine-float.c         |  338 ++--
 pixman/pixman-combine32.c             | 1686 +-------------------
 pixman/pixman-fast-path.c             |    2 
 pixman/pixman-general.c               |   27 
 pixman/pixman-gradient-walker.c       |    2 
 pixman/pixman-inlines.h               |    3 
 pixman/pixman-mips-dspr2-asm.S        |    2 
 pixman/pixman-mips-dspr2-asm.h        |    4 
 pixman/pixman-mips-dspr2.c            |   10 
 pixman/pixman-mips-dspr2.h            |    8 
 pixman/pixman-mmx.c                   |  109 +
 pixman/pixman-private.h               |    6 
 pixman/pixman-sse2.c                  |   24 
 pixman/pixman-vmx.c                   | 1315 +++++++++++++++-
 pixman/pixman.c                       |   18 
 test/Makefile.sources                 |   60 
 test/affine-bench.c                   |  436 +++++
 test/blitters-test.c                  |   20 
 test/check-formats.c                  |  176 --
 test/composite.c                      |   11 
 test/lowlevel-blt-bench.c             |  507 +++++-
 test/pixel-test.c                     | 2780 +++++++++++++++++++++++++++++++++-
 test/radial-invalid.c                 |   54 
 test/solid-test.c                     |  353 ++++
 test/thread-test.c                    |   29 
 test/tolerance-test.c                 |  360 ++++
 test/utils.c                          |  653 ++++++-
 test/utils.h                          |   13 
 47 files changed, 9417 insertions(+), 2455 deletions(-)

New commits:
commit 42fab57651e2ebdde5d260ae76809a2500086839
Author: Andreas Boll <andreas.boll.dev@gmail.com>
Date:   Fri Sep 4 13:40:42 2015 +0200

    Bump standards version to 3.9.6.

diff --git a/debian/changelog b/debian/changelog
index 245fb5c..e73a52d 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -6,6 +6,7 @@ pixman (0.33.2-1) UNRELEASED; urgency=medium
   * Update Vcs-* fields.
   * Add upstream url.
   * Drop XC- prefix from Package-Type field.
+  * Bump standards version to 3.9.6.
 
   [ intrigeri ]
   * Simplify hardening build flags handling (closes: #760100).
diff --git a/debian/control b/debian/control
index c78d8b6..6188e41 100644
--- a/debian/control
+++ b/debian/control
@@ -7,7 +7,7 @@ Build-Depends:
  dh-autoreconf,
  pkg-config,
  quilt,
-Standards-Version: 3.9.2
+Standards-Version: 3.9.6
 Vcs-Git: https://anonscm.debian.org/git/pkg-xorg/lib/pixman.git
 Vcs-Browser: https://anonscm.debian.org/cgit/pkg-xorg/lib/pixman.git
 Homepage: http://pixman.org/

commit 56432ef5e5a38ddd77e23d10e1e8f724afcbedd8
Author: Andreas Boll <andreas.boll.dev@gmail.com>
Date:   Fri Sep 4 13:38:49 2015 +0200

    Drop XC- prefix from Package-Type field.

diff --git a/debian/changelog b/debian/changelog
index e6627d6..245fb5c 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -5,6 +5,7 @@ pixman (0.33.2-1) UNRELEASED; urgency=medium
   * Enable vmx on ppc64el (closes: #786345).
   * Update Vcs-* fields.
   * Add upstream url.
+  * Drop XC- prefix from Package-Type field.
 
   [ intrigeri ]
   * Simplify hardening build flags handling (closes: #760100).
diff --git a/debian/control b/debian/control
index 03277a6..c78d8b6 100644
--- a/debian/control
+++ b/debian/control
@@ -28,7 +28,7 @@ Description: pixel-manipulation library for X and cairo
 
 Package: libpixman-1-0-udeb
 Section: debian-installer
-XC-Package-Type: udeb
+Package-Type: udeb
 Architecture: any
 Depends:
  ${shlibs:Depends},

commit c0f98e1cf4fa897eb67a3ef737b24deacda5ae7e
Author: Andreas Boll <andreas.boll.dev@gmail.com>
Date:   Fri Sep 4 11:47:45 2015 +0200

    Add upstream url.

diff --git a/debian/changelog b/debian/changelog
index 05d7550..e6627d6 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -4,6 +4,7 @@ pixman (0.33.2-1) UNRELEASED; urgency=medium
   * New upstream release candidate.
   * Enable vmx on ppc64el (closes: #786345).
   * Update Vcs-* fields.
+  * Add upstream url.
 
   [ intrigeri ]
   * Simplify hardening build flags handling (closes: #760100).
diff --git a/debian/control b/debian/control
index a56b239..03277a6 100644
--- a/debian/control
+++ b/debian/control
@@ -10,6 +10,7 @@ Build-Depends:
 Standards-Version: 3.9.2
 Vcs-Git: https://anonscm.debian.org/git/pkg-xorg/lib/pixman.git
 Vcs-Browser: https://anonscm.debian.org/cgit/pkg-xorg/lib/pixman.git
+Homepage: http://pixman.org/
 
 Package: libpixman-1-0
 Section: libs

commit 03e2d2138b1248c79658e5edeaf66b283a278ff2
Author: Andreas Boll <andreas.boll.dev@gmail.com>
Date:   Fri Sep 4 11:46:39 2015 +0200

    Update Vcs-* fields.

diff --git a/debian/changelog b/debian/changelog
index 4cdb1aa..05d7550 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -3,6 +3,7 @@ pixman (0.33.2-1) UNRELEASED; urgency=medium
   [ Andreas Boll ]
   * New upstream release candidate.
   * Enable vmx on ppc64el (closes: #786345).
+  * Update Vcs-* fields.
 
   [ intrigeri ]
   * Simplify hardening build flags handling (closes: #760100).
diff --git a/debian/control b/debian/control
index 18a1b7f..a56b239 100644
--- a/debian/control
+++ b/debian/control
@@ -8,8 +8,8 @@ Build-Depends:
  pkg-config,
  quilt,
 Standards-Version: 3.9.2
-Vcs-Git: git://git.debian.org/git/pkg-xorg/lib/pixman
-Vcs-Browser: http://git.debian.org/?p=pkg-xorg/lib/pixman.git
+Vcs-Git: https://anonscm.debian.org/git/pkg-xorg/lib/pixman.git
+Vcs-Browser: https://anonscm.debian.org/cgit/pkg-xorg/lib/pixman.git
 
 Package: libpixman-1-0
 Section: libs

commit e6fce5e4e47a7a1597defa0c8f89eba0222b8953
Author: intrigeri <intrigeri@debian.org>
Date:   Sun Aug 31 16:56:42 2014 +0000

    Update changelog.
    
    Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>

diff --git a/debian/changelog b/debian/changelog
index 37ddf53..4cdb1aa 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,8 +1,14 @@
 pixman (0.33.2-1) UNRELEASED; urgency=medium
 
+  [ Andreas Boll ]
   * New upstream release candidate.
   * Enable vmx on ppc64el (closes: #786345).
 
+  [ intrigeri ]
+  * Simplify hardening build flags handling (closes: #760100).
+    Thanks to Simon Ruderich <simon@ruderich.org> for the patch.
+  * Enable all hardening build flags. Thanks to Simon Ruderich too.
+
  -- Andreas Boll <andreas.boll.dev@gmail.com>  Fri, 04 Sep 2015 11:29:52 +0200
 
 pixman (0.32.6-3) sid; urgency=medium

commit 7bc925aa5056ea114822bd9d06d94852946ba3d4
Author: intrigeri <intrigeri@debian.org>
Date:   Sun Aug 31 16:54:54 2014 +0000

    Enable all hardening build flags. Thanks to Simon Ruderich <simon@ruderich.org> for the patch.
    
    Quoting Simon again: "It currently has the same effect as hardening=+bindnow,
    but will automatically enable future hardening options and in case the package
    will ever build binaries those are immediately protected with PIE as well."
    
    Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>

diff --git a/debian/rules b/debian/rules
index 99d67fc..a0e0b9e 100755
--- a/debian/rules
+++ b/debian/rules
@@ -3,7 +3,7 @@
 PACKAGE = libpixman-1-0
 SHLIBS  = 0.25.2
 
-export DEB_BUILD_MAINT_OPTIONS = hardening=+bindnow
+export DEB_BUILD_MAINT_OPTIONS = hardening=+all
 
 # Disable Gtk+ autodetection:
 override_dh_auto_configure:

commit 2fb4da778cc2ce30df4e1e692dc82d00c6593137
Author: intrigeri <intrigeri@debian.org>
Date:   Sun Aug 31 16:53:25 2014 +0000

    Simplify hardening build flags handling. Thanks to Simon Ruderich <simon@ruderich.org> for the patch.
    
    Quoting Simon Ruderich <simon@ruderich.org>:
    "There's no need to use dpkg-buildflags manually in debian/rules.
    Debhelper with compat=9 automatically enables the hardening flags when
    dh_auto_configure is used. So just by calling dh_auto_configure [...]
    the hardening flags get automatically passed to the build system.
    DEB_BUILD_MAINT_OPTIONS is also respected."
    
    Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>

diff --git a/debian/rules b/debian/rules
index a8100d2..99d67fc 100755
--- a/debian/rules
+++ b/debian/rules
@@ -11,8 +11,7 @@ override_dh_auto_configure:
 	# changelog entry:
 	LS_CFLAGS=" " dh_auto_configure -- --disable-gtk \
 	  --disable-silent-rules \
-	  --disable-arm-iwmmxt \
-	  $(shell dpkg-buildflags --export=configure)
+	  --disable-arm-iwmmxt
 
 # Install in debian/tmp to retain control through dh_install:
 override_dh_auto_install:

commit e47fb32ae3180d847a4f0e8f88f71174004b90b3
Author: Andreas Boll <andreas.boll.dev@gmail.com>
Date:   Fri Sep 4 11:34:44 2015 +0200

    Enable vmx on ppc64el (closes: #786345).

diff --git a/debian/changelog b/debian/changelog
index 7db916f..37ddf53 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,6 +1,7 @@
 pixman (0.33.2-1) UNRELEASED; urgency=medium
 
   * New upstream release candidate.
+  * Enable vmx on ppc64el (closes: #786345).
 
  -- Andreas Boll <andreas.boll.dev@gmail.com>  Fri, 04 Sep 2015 11:29:52 +0200
 
diff --git a/debian/patches/ppc64el.diff b/debian/patches/ppc64el.diff
deleted file mode 100644
index 34a4aa0..0000000
--- a/debian/patches/ppc64el.diff
+++ /dev/null
@@ -1,14 +0,0 @@
-diff --git a/configure.ac b/configure.ac
-index dce76b3..172de8b 100644
---- a/configure.ac
-+++ b/configure.ac
-@@ -540,6 +540,9 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
- #if defined(__GNUC__) && (__GNUC__ < 3 || (__GNUC__ == 3 && __GNUC_MINOR__ < 4))
- #error "Need GCC >= 3.4 for sane altivec support"
- #endif
-+#if defined(__PPC64__) && (__BYTE_ORDER__==__ORDER_LITTLE_ENDIAN__)
-+#error VMX utilization is still not ready on ppc64el
-+#endif
- #include <altivec.h>
- int main () {
-     vector unsigned int v = vec_splat_u32 (1);
diff --git a/debian/patches/series b/debian/patches/series
index eebecc8..708b774 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -1,2 +1 @@
-ppc64el.diff
 test-increase-timeout.diff

commit 18e4bdcadf77910f2e22ce66b01b5bd98006c9fa
Author: Andreas Boll <andreas.boll.dev@gmail.com>
Date:   Fri Sep 4 11:30:12 2015 +0200

    Bump changelogs.

diff --git a/ChangeLog b/ChangeLog
index 2f951b8..96b8c28 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,10 +1,1548 @@
-commit 87eea99e443b389c978cf37efc52788bf03a0ee0
+commit ee790044b08e3b668e6aa5d9229f46ed7295ebf0
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date:   Sat Aug 1 22:34:53 2015 +0300
+
+    Pre-release version bump to 0.33.2
+    
+    Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+
+commit 8d9be3619a906855a3e3a1e052317833cb24cabe
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date:   Wed Jul 1 14:34:07 2015 +0300
+
+    vmx: implement fast path iterator vmx_fetch_a8
+    
+    no changes were observed when running cairo trimmed benchmarks.
+    
+    Running "lowlevel-blt-bench src_8_8888" on POWER8, 8 cores,
+    3.4GHz, RHEL 7.1 ppc64le gave the following results:
+    
+    reference memcpy speed = 25197.2MB/s (6299.3MP/s for 32bpp fills)
+    
+                    Before          After           Change
+                  --------------------------------------------
+    L1              965.34          3936           +307.73%
+    L2              942.99          3436.29        +264.40%
+    M               902.24          2757.77        +205.66%
+    HT              448.46          784.99         +75.04%
+    VT              430.05          819.78         +90.62%
+    R               412.9           717.04         +73.66%
+    RT              168.93          220.63         +30.60%
+    Kops/s          1025            1303           +27.12%
+    
+    It was benchmarked against commid id e2d211a from pixman/master
+    
+    Siarhei Siamashka reported that on playstation3, it shows the following
+    results:
+    
+    == before ==
+    
+                  src_8_8888 =  L1: 194.37  L2: 198.46  M:155.90 (148.35%)
+                  HT: 59.18  VT: 36.71  R: 38.93  RT: 12.79 ( 106Kops/s)
+    
+    == after ==
+    
+                  src_8_8888 =  L1: 373.96  L2: 391.10  M:245.81 (233.88%)
+                  HT: 80.81  VT: 44.33  R: 48.10  RT: 14.79 ( 122Kops/s)
+    
+    Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+    Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+
+commit 47f74ca94637d79ee66c37a81eea0200e453fcc1
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date:   Mon Jun 29 15:31:02 2015 +0300
+
+    vmx: implement fast path iterator vmx_fetch_x8r8g8b8
+    
+    It was benchmarked against commid id 2be523b from pixman/master
+    
+    POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le.
+    
+    cairo trimmed benchmarks :
+    
+    Speedups
+    ========
+    t-firefox-asteroids  533.92  -> 489.94 :  1.09x
+    
+    Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+    Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+
+commit fcbb97d4458d717b9c15858aedcbee2d33c8ac5a
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date:   Sun Jun 28 23:25:24 2015 +0300
+
+    vmx: implement fast path scaled nearest vmx_8888_8888_OVER
+    
+    It was benchmarked against commid id 2be523b from pixman/master
+    
+    POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le.
+    reference memcpy speed = 24764.8MB/s (6191.2MP/s for 32bpp fills)
+    
+                    Before           After           Change
+                  ---------------------------------------------
+    L1              134.36          181.68          +35.22%
+    L2              135.07          180.67          +33.76%
+    M               134.6           180.51          +34.11%
+    HT              121.77          128.79          +5.76%
+    VT              120.49          145.07          +20.40%
+    R               93.83           102.3           +9.03%
+    RT              50.82           46.93           -7.65%
+    Kops/s          448             422             -5.80%
+    
+    cairo trimmed benchmarks :
+    
+    Speedups
+    ========
+    t-firefox-asteroids  533.92 -> 497.92 :  1.07x
+        t-midori-zoomed  692.98 -> 651.24 :  1.06x
+    
+    Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+    Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+
+commit ad612c4205f0ae46fc72a50e0c90ccd05487fcba
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date:   Sun Jun 28 22:23:44 2015 +0300
+
+    vmx: implement fast path vmx_composite_src_x888_8888
+    
+    It was benchmarked against commid id 2be523b from pixman/master
+    
+    POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le.
+    reference memcpy speed = 24764.8MB/s (6191.2MP/s for 32bpp fills)
+    
+                    Before           After           Change
+                  ---------------------------------------------
+    L1              1115.4          5006.49         +348.85%
+    L2              1112.26         4338.01         +290.02%
+    M               1110.54         2524.15         +127.29%
+    HT              745.41          1140.03         +52.94%
+    VT              749.03          1287.13         +71.84%
+    R               423.91          547.6           +29.18%
+    RT              205.79          194.98          -5.25%
+    Kops/s          1414            1361            -3.75%
+    
+    cairo trimmed benchmarks :
+    
+    Speedups
+    ========
+    t-gnome-system-monitor  1402.62  -> 1212.75 :  1.16x
+       t-firefox-asteroids   533.92  ->  474.50 :  1.13x
+    
+    Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+    Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+
+commit fafc1d403b8405727d3918bcb605cb98044af90a
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date:   Sun Jun 28 10:14:20 2015 +0300
+
+    vmx: implement fast path vmx_composite_over_n_8888_8888_ca
+    
+    It was benchmarked against commid id 2be523b from pixman/master
+    
+    POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le.
+    
+    reference memcpy speed = 24764.8MB/s (6191.2MP/s for 32bpp fills)
+    
+                    Before           After           Change
+                  ---------------------------------------------
+    L1              61.92            244.91          +295.53%
+    L2              62.74            243.3           +287.79%
+    M               63.03            241.94          +283.85%
+    HT              59.91            144.22          +140.73%
+    VT              59.4             174.39          +193.59%
+    R               53.6             111.37          +107.78%
+    RT              37.99            46.38           +22.08%
+    Kops/s          436              506             +16.06%
+    
+    cairo trimmed benchmarks :
+    
+    Speedups
+    ========
+    t-xfce4-terminal-a1  1540.37 -> 1226.14 :  1.26x
+    t-firefox-talos-gfx  1488.59 -> 1209.19 :  1.23x
+    
+    Slowdowns
+    =========
+            t-evolution  553.88  -> 581.63  :  1.05x
+              t-poppler  364.99  -> 383.79  :  1.05x
+    t-firefox-scrolling  1223.65 -> 1304.34 :  1.07x
+    
+    The slowdowns can be explained in cases where the images are small and
+    un-aligned to 16-byte boundary. In that case, the function will first
+    work on the un-aligned area, even in operations of 1 byte. In case of
+    small images, the overhead of such operations can be more than the
+    savings we get from using the vmx instructions that are done on the
+    aligned part of the image.
+    
+    In the C fast-path implementation, there is no special treatment for the
+    un-aligned part, as it works in 4 byte quantities on the entire image.
+    
+    Because llbb is a synthetic test, I would assume it has much less
+    alignment issues than "real-world" scenario, such as cairo benchmarks,
+    which are basically recorded traces of real application activity.
+    
+    Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+    Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+
+commit a3e914407e354df70b9200e263608f1fc2e686cf
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date:   Thu Jun 18 15:05:49 2015 +0300
+
+    vmx: implement fast path composite_add_8888_8888
+    
+    Copied impl. from sse2 file and edited to use vmx functions
+    
+    It was benchmarked against commid id 2be523b from pixman/master
+    
+    POWER8, 16 cores, 3.4GHz, ppc64le :
+    
+    reference memcpy speed = 27036.4MB/s (6759.1MP/s for 32bpp fills)
+    
+                    Before           After           Change
+                  ---------------------------------------------
+    L1              248.76          3284.48         +1220.34%
+    L2              264.09          2826.47         +970.27%
+    M               261.24          2405.06         +820.63%
+    HT              217.27          857.3           +294.58%
+    VT              213.78          980.09          +358.46%
+    R               176.61          442.95          +150.81%
+    RT              107.54          150.08          +39.56%
+    Kops/s          917             1125            +22.68%
+    
+    Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+    Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+
+commit d5b5343c7df99082597e0c37aec937dcf5b6602d
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date:   Thu Jun 18 14:56:47 2015 +0300
+
+    vmx: implement fast path composite_add_8_8
+    
+    Copied impl. from sse2 file and edited to use vmx functions
+    
+    It was benchmarked against commid id 2be523b from pixman/master
+    
+    POWER8, 16 cores, 3.4GHz, ppc64le :
+    
+    reference memcpy speed = 27036.4MB/s (6759.1MP/s for 32bpp fills)
+    
+                    Before           After           Change
+                  ---------------------------------------------
+    L1              687.63          9140.84         +1229.33%
+    L2              715             7495.78         +948.36%
+    M               717.39          8460.14         +1079.29%
+    HT              569.56          1020.12         +79.11%
+    VT              520.3           1215.56         +133.63%
+    R               514.81          874.35          +69.84%
+    RT              341.28          305.42          -10.51%
+    Kops/s          1621            1579            -2.59%
+    
+    Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+    Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+
+commit 339eeaf095f949694d7f79a45171ac03a3b06f90
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date:   Thu Jun 18 14:12:05 2015 +0300
+
+    vmx: implement fast path composite_over_8888_8888
+    
+    Copied impl. from sse2 file and edited to use vmx functions
+    
+    It was benchmarked against commid id 2be523b from pixman/master
+    
+    POWER8, 16 cores, 3.4GHz, ppc64le :
+    
+    reference memcpy speed = 27036.4MB/s (6759.1MP/s for 32bpp fills)
+    
+                    Before           After           Change
+                  ---------------------------------------------
+    L1              129.47          1054.62         +714.57%
+    L2              138.31          1011.02         +630.98%
+    M               139.99          1008.65         +620.52%
+    HT              122.11          468.45          +283.63%
+    VT              121.06          532.21          +339.62%
+    R               108.48          240.5           +121.70%
+    RT              77.87           116.7           +49.87%
+    Kops/s          758             981             +29.42%
+    
+    Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+    Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+
+commit 0cc8a2e9714efcb7cdd7e2a94c9cba49c3e29e00
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date:   Sun Jun 28 09:42:19 2015 +0300
+
+    vmx: implement fast path vmx_fill
+    
+    Based on sse2 impl.
+    
+    It was benchmarked against commid id e2d211a from pixman/master
+    
+    Tested cairo trimmed benchmarks on POWER8, 8 cores, 3.4GHz,
+    RHEL 7.1 ppc64le :
+    
+    speedups
+    ========
+         t-swfdec-giant-steps  1383.09 ->  718.63  :  1.92x speedup
+       t-gnome-system-monitor  1403.53 ->  918.77  :  1.53x speedup
+                  t-evolution  552.34  ->  415.24  :  1.33x speedup
+          t-xfce4-terminal-a1  1573.97 ->  1351.46 :  1.16x speedup
+          t-firefox-paintball  847.87  ->  734.50  :  1.15x speedup
+          t-firefox-asteroids  565.99  ->  492.77  :  1.15x speedup
+    t-firefox-canvas-swscroll  1656.87 ->  1447.48 :  1.14x speedup
+              t-midori-zoomed  724.73  ->  642.16  :  1.13x speedup
+       t-firefox-planet-gnome  975.78  ->  911.92  :  1.07x speedup
+              t-chromium-tabs  292.12  ->  274.74  :  1.06x speedup
+         t-firefox-chalkboard  690.78  ->  653.93  :  1.06x speedup
+          t-firefox-talos-gfx  1375.30 ->  1303.74 :  1.05x speedup
+       t-firefox-canvas-alpha  1016.79 ->  967.24  :  1.05x speedup
+    
+    Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+    Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+
+commit c12ee95089e7d281a29a24bf56b81f5c16dec6ee
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date:   Sun Jun 28 09:42:08 2015 +0300
+
+    vmx: add helper functions
+    
+    This patch adds the following helper functions for reuse of code,
+    hiding BE/LE differences and maintainability.
+    
+    All of the functions were defined as static force_inline.
+    
+    Names were copied from pixman-sse2.c so conversion of fast-paths between
+    sse2 and vmx would be easier from now on. Therefore, I tried to keep the
+    input/output of the functions to be as close as possible to the sse2
+    definitions.
+    
+    The functions are:
+    
+    - load_128_aligned       : load 128-bit from a 16-byte aligned memory
+                               address into a vector
+    
+    - load_128_unaligned     : load 128-bit from memory into a vector,
+                               without guarantee of alignment for the
+                               source pointer
+    
+    - save_128_aligned       : save 128-bit vector into a 16-byte aligned
+                               memory address
+    
+    - create_mask_16_128     : take a 16-bit value and fill with it
+                               a new vector
+    
+    - create_mask_1x32_128   : take a 32-bit pointer and fill a new
+                               vector with the 32-bit value from that pointer
+    
+    - create_mask_32_128     : take a 32-bit value and fill with it
+                               a new vector
+    
+    - unpack_32_1x128        : unpack 32-bit value into a vector
+    
+    - unpacklo_128_16x8      : unpack the eight low 8-bit values of a vector
+    
+    - unpackhi_128_16x8      : unpack the eight high 8-bit values of a vector
+    
+    - unpacklo_128_8x16      : unpack the four low 16-bit values of a vector
+    
+    - unpackhi_128_8x16      : unpack the four high 16-bit values of a vector
+    
+    - unpack_128_2x128       : unpack the eight low 8-bit values of a vector
+                               into one vector and the eight high 8-bit
+                               values into another vector
+    
+    - unpack_128_2x128_16    : unpack the four low 16-bit values of a vector
+                               into one vector and the four high 16-bit
+                               values into another vector
+    
+    - unpack_565_to_8888     : unpack an RGB_565 vector to 8888 vector
+    
+    - pack_1x128_32          : pack a vector and return the LSB 32-bit of it
+    
+    - pack_2x128_128         : pack two vectors into one and return it
+    
+    - negate_2x128           : xor two vectors with mask_00ff (separately)
+    
+    - is_opaque              : returns whether all the pixels contained in
+                               the vector are opaque
+    
+    - is_zero                : returns whether the vector equals 0
+    
+    - is_transparent         : returns whether all the pixels
+                               contained in the vector are transparent
+    
+    - expand_pixel_8_1x128   : expand an 8-bit pixel into lower 8 bytes of a
+                               vector
+    
+    - expand_alpha_1x128     : expand alpha from vector and return the new
+                               vector
+    
+    - expand_alpha_2x128     : expand alpha from one vector and another alpha
+                               from a second vector
+    
+    - expand_alpha_rev_2x128 : expand a reversed alpha from one vector and
+                               another reversed alpha from a second vector
+    
+    - pix_multiply_2x128     : do pix_multiply for two vectors (separately)
+    
+    - over_2x128             : perform over op. on two vectors
+    
+    - in_over_2x128          : perform in-over op. on two vectors
+    
+    v2: removed expand_pixel_32_1x128 as it was not used by any function and
+    its implementation was erroneous
+    
+    Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+    Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+
+commit 034149537be94862b43fb09699b8c2149bfe948d
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date:   Thu Jul 2 11:04:20 2015 +0300
+
+    vmx: add LOAD_VECTOR macro
+    
+    This patch adds a macro for loading a single vector.
+    It also make the other LOAD_VECTORx macros use this macro as a base so
+    code would be re-used.
+    
+    In addition, I fixed minor coding style issues.
+    
+    Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+    Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+
+commit 744134025609a0a5805c2d3b4d34856eb75cb711
+Author: Nemanja Lukic <nemanja.lukic@rt-rk.com>
+Date:   Fri Jun 27 18:05:39 2014 +0200
+
+    MIPS: update author's e-mail address
+    
+    Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+
+commit e2d211ac491cd9884aae7ccaf18e5b3042469cf2
+Author: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+Date:   Wed Jun 10 13:54:01 2015 +0300
+
+    lowlevel-blt-bench: add option to skip memcpy measurement
+    
+    The memcpy speed measurement takes several seconds. When you are running
+    single tests in a harness that iterates dozens or hundreds of times, the
+    repeated measurements are redundant and take a lot of time. It is also
+    an open question whether the measured speed changes over long test runs
+    due to unidentified platform reasons (Raspberry Pi).
+    
+    Add a command line option to set the reference memcpy speed, skipping
+    the measuring.
+    
+    The speed is mainly used to compute how many iterations do run inside
+    the bench_*() functions, so for repeated testing on the same hardware,
+    it makes sense to lock that number to a constant.
+    
+    Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+    Reviewed-by: Ben Avison <bavison@riscosopen.org>
+
+commit 31cb0d4267f4f358b62f75fd42c4b1ae625be7ee
+Author: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+Date:   Wed Jun 10 13:20:47 2015 +0300
+
+    lowlevel-blt-bench: add CSV output mode
+    
+    Add a command line option for choosing CSV output mode.
+    
+    In CSV mode, only the results in Mpixels/s are printed in an easily
+    machine-parseable format. All user-friendly printing is suppressed.
+    
+    This is intended for cases where you benchmark one particular operation
+    at a time. Running the "all" set of benchmarks will print just fine, but
+    you may have trouble matching rows to operations as you have to look at
+    the tests_tbl[] to see what row is which.
+    
+    Reviewed-by: Ben Avison <bavison@riscosopen.org>
+    
+    v2: don't add a space after comma in CSV.
+    
+    Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+
+commit 9a7e0bc6d08c0324f09d6440270cd07201929f3f
+Author: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+Date:   Wed Jun 10 12:41:57 2015 +0300
+
+    lowlevel-blt-bench: refactor to Mpx_per_sec()
+    
+    Refactor the Mpixels/s computations into a function. Easier to read and
+    better documents what is being computed.
+    
+    Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+    Reviewed-by: Ben Avison <bavison@riscosopen.org>
+
+commit 6e9c48c579e3325506234fa2ee7635f08f2c5a33
+Author: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+Date:   Wed Jun 10 12:53:09 2015 +0300
+
+    lowlevel-blt-bench: all bench funcs to return pix_cnt
+    
+    The bench_* functions, that did not already do it, are modified to
+    return the number of pixels processed during the benchmark. This moves
+    the computation to the site that actually determines the number, and
+    simplifies bench_composite() a bit.
+    
+    Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+    Reviewed-by: Ben Avison <bavison@riscosopen.org>
+
+commit 9e8f2bcaf5fabd3729ee0ecc90009fd6cea9e8e9
+Author: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+Date:   Wed Jun 10 12:02:17 2015 +0300
+
+    lowlevel-blt-bench: move speed and scaling printing
+    
+    Move the printing of the memory speed and scaling mode into a new
+    function. This will help with implementing a machine-readable output
+    option.
+    
+    Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+    Reviewed-by: Ben Avison <bavison@riscosopen.org>
+
+commit a33c2e6853fe0a76da42a43ed7ed9095e2dbe6a2
+Author: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+Date:   Wed Jun 10 11:56:39 2015 +0300
+
+    lowlevel-blt-bench: print single pattern details
+    
+    When given just a single test pattern instead of "all", print the test
+    details. This can be used to verify the pattern parser agrees with the
+    user, just like scaling settings are printed.
+    
+    Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+    Reviewed-by: Ben Avison <bavison@riscosopen.org>
+
+commit 3ac7ae201758fe99627fdb2adf783be4063a9b1f
+Author: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+Date:   Wed Jun 10 11:34:45 2015 +0300
+
+    lowlevel-blt-bench: make test_entry::testname const
+    
+    We assign string literals to it, so it better be const.
+    
+    Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+    Reviewed-by: Ben Avison <bavison@riscosopen.org>
+
+commit 56d8b365f5944bf78a427ac65c5a0d0311e0da5e
+Author: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+Date:   Wed Jun 10 11:21:14 2015 +0300
+
+    lowlevel-blt-bench: move explanation printing
+    
+    Move explanation printing to a new function. This will help with
+    implementing a machine-readable output option.
+    
+    Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+    Reviewed-by: Ben Avison <bavison@riscosopen.org>
+
+commit bddff993ed734f4b9030c1960bcb3ebe1caca807
+Author: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+Date:   Wed Jun 10 11:14:38 2015 +0300
+
+    lowlevel-blt-bench: move usage to a function
+    
+    Move printing of usage into a new function and use argv[0] as the
+    program name. This will help printing usage from multiple places.
+    
+    Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+    Reviewed-by: Ben Avison <bavison@riscosopen.org>
+
+commit 2be523b20402b7c9f548ac33b8c0f0ed00156c64
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date:   Thu Jun 25 15:59:57 2015 +0300
+
+    vmx: fix pix_multiply for ppc64le
+    
+    vec_mergeh/l operates differently for BE and LE, because of the order of
+    the vector elements (l->r in BE and r->l in LE).
+    To fix that, we simply need to swap between the input parameters, in case
+    we are working in LE.
+    
+    v2:
+    
+    - replace _LITTLE_ENDIAN with WORDS_BIGENDIAN for consistency
+    - fixed whitespaces and indentation issues
+    
+    Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+    Reviewed-by: Adam Jackson <ajax@redhat.com>
+    Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+
+commit 8d379ad88e208bed9697065f6911c9ef83d85276
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date:   Thu Jun 25 15:59:56 2015 +0300
+
+    vmx: fix unused var warnings
+    
+    v2: don't put ';' at the end of macro definition. Instead, move it to
+        each line the macro is used.
+    
+    Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+    Reviewed-by: Adam Jackson <ajax@redhat.com>
+    Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+
+commit ff66a4a3ce95f2adcbf30b354eac60944596d6a2
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date:   Thu Jun 25 15:59:55 2015 +0300
+
+    vmx: encapsulate the temporary variables inside the macros
+    
+    v2: fixed whitespaces and indentation issues
+    
+    Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+    Reviewed-by: Adam Jackson <ajax@redhat.com>
+    Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+
+commit f6a26d09257dde9cd41144120543c8b754de515f
+Author: Fernando Seiti Furusato <ferseiti@linux.vnet.ibm.com>
+Date:   Thu Jun 25 15:59:54 2015 +0300
+
+    vmx: adjust macros when loading vectors on ppc64le
+    
+    Replaced usage of vec_lvsl to direct unaligned assignment
+    operation (=). That is because, according to Power ABI Specification,
+    the usage of lvsl is deprecated on ppc64le.
+    
+    Changed COMPUTE_SHIFT_{MASK,MASKS,MASKC} macro usage to no-op for powerpc
+    little endian since unaligned access is supported on ppc64le.
+    
+    v2:
+    
+    - replace _LITTLE_ENDIAN with WORDS_BIGENDIAN for consistency
+    - fixed whitespaces and indentation issues
+    
+    Signed-off-by: Fernando Seiti Furusato <ferseiti@linux.vnet.ibm.com>
+    Reviewed-by: Adam Jackson <ajax@redhat.com>
+    Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+    Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+
+commit b3a61703f41c6b34ba2ec9736030e1df04f53ab4
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date:   Thu Jun 25 15:59:53 2015 +0300
+
+    vmx: fix splat_alpha for ppc64le
+    
+    The permutation vector isn't correct for LE, so correct its values
+    in case we are in LE mode.
+    
+    v2:
+    
+    - replace _LITTLE_ENDIAN with WORDS_BIGENDIAN for consistency
+    - change #ifndef to #ifdef for readability
+    
+    Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+    Reviewed-by: Adam Jackson <ajax@redhat.com>
+    Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+
+commit eebc1b78200aff075dbcae9c8d00edad1f830d91
+Author: Ben Avison <bavison@riscosopen.org>
+Date:   Tue May 26 23:58:29 2015 +0100
+
+    mmx/sse2: Use SIMPLE_NEAREST_SOLID_MASK_FAST_PATH for NORMAL repeat
+    
+    These two architectures were the only place where
+    SIMPLE_NEAREST_SOLID_MASK_FAST_PATH was used, and in both cases the
+    equivalent SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL macro was used
+    immediately afterwards, so including the NORMAL case in the main macro
+    simplifies the fast path table.
+    
+    [Pekka: removed extra comma from the end of
+     SIMPLE_NEAREST_SOLID_MASK_FAST_PATH]
+    
+    Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+    Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+
+commit 7f6692807902b840b81f860fb2196d2fb242d977
+Author: Ben Avison <bavison@riscosopen.org>
+Date:   Tue May 26 23:58:28 2015 +0100
+
+    mmx/sse2: Use SIMPLE_NEAREST_FAST_PATH macro
+    
+    There is some reordering, but the only significant thing to ensure that
+    the same routine is chosen is that a COVER fast path for a given
+    combination of operator and source/destination pixel formats must
+    precede all the variants of repeated fast paths for the same
+    combination. This patch (and the other mmx/sse2 one) still follows that
+    rule.
+    
+    I believe that in every other case, the set of operations that match any
+    pair of fast paths that are reordered in these patches are mutually
+    exclusive. While there will be a very subtle timing difference due to
+    the distance through the table we have to search to find a match
+    (sometimes faster, sometime slower) there is no evidence that the tables
+    have been carefully ordered by frequency of occurrence - just for ease
+    of copy-and-pasting.
+    
+    Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+    Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+


Reply to: