pixman: Changes to 'debian-unstable'
ChangeLog | 667 +++++++++++++++++++++++++++++++++++++++++
configure.ac | 22 -
debian/changelog | 7
pixman/pixman-arm-simd-asm.S | 41 ++
pixman/pixman-arm-simd.c | 6
pixman/pixman-general.c | 18 -
pixman/pixman-implementation.c | 16
pixman/pixman-mmx.c | 64 ---
pixman/pixman-vmx.c | 492 ++++++++++++------------------
pixman/pixman.c | 17 -
test/Makefile.sources | 2
test/affine-bench.c | 24 +
test/cover-test.c | 449 +++++++++++++++++++++++++++
test/fence-image-self-test.c | 239 ++++++++++++++
test/lowlevel-blt-bench.c | 6
test/scaling-test.c | 66 ++--
test/utils.c | 133 +++++++-
test/utils.h | 21 +
18 files changed, 1873 insertions(+), 417 deletions(-)
New commits:
commit 017a59ec26f3d70b577ddf868551f16198806f81
Author: Andreas Boll <andreas.boll.dev@gmail.com>
Date: Wed Nov 4 13:26:38 2015 +0100
Upload to unstable
diff --git a/debian/changelog b/debian/changelog
index be437ce..98410b4 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,8 +1,9 @@
-pixman (0.33.4-1) UNRELEASED; urgency=medium
+pixman (0.33.4-1) unstable; urgency=medium
+ * Team upload.
* New upstream release candidate.
- -- Andreas Boll <andreas.boll.dev@gmail.com> Wed, 04 Nov 2015 10:30:37 +0100
+ -- Andreas Boll <andreas.boll.dev@gmail.com> Wed, 04 Nov 2015 13:26:18 +0100
pixman (0.33.2-2) sid; urgency=medium
commit c19373008340e1ad159ded12e45275b5b06bb513
Author: Andreas Boll <andreas.boll.dev@gmail.com>
Date: Wed Nov 4 10:30:58 2015 +0100
Bump changelogs.
diff --git a/ChangeLog b/ChangeLog
index 96b8c28..9a56a72 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,670 @@
+commit fa71d08a81c9bf3f2366ee45474ff868d9e10b8e
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date: Fri Oct 23 17:58:49 2015 +0300
+
+ Pre-release version bump to 0.33.4
+
+ Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+
+commit 9728241bd098bc4260e6cd83997dfecc64adc356
+Author: Andrea Canciani <ranma42@gmail.com>
+Date: Tue Oct 13 13:35:59 2015 +0200
+
+ test: Fix fence-image-self-test on Mac
+
+ On MacOS X, according to the manpage of mprotect(), "When a program
+ violates the protections of a page, it gets a SIGBUS or SIGSEGV
+ signal.", but fence-image-self-test was only accepting a SIGSEGV as
+ notification of invalid access.
+
+ Fixes fence-image-self-test
+
+ Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+
+commit 7de61d8d14e84623b6fa46506eb74f938287f536
+Author: Matt Turner <mattst88@gmail.com>
+Date: Sun Oct 11 14:44:46 2015 -0700
+
+ mmx: Use MMX2 intrinsics from xmmintrin.h directly.
+
+ We had lots of hacks to handle the inability to include xmmintrin.h
+ without compiling with -msse (lest SSE instructions be used in
+ pixman-mmx.c). Some recent version of gcc relaxed this restriction.
+
+ Change configure.ac to test that xmmintrin.h can be included and that we
+ can use some intrinsics from it, and remove the work-around code from
+ pixman-mmx.c.
+
+ Evidently allows gcc 4.9.3 to optimize better as well:
+
+ text data bss dec hex filename
+ 657078 30848 680 688606 a81de libpixman-1.so.0.33.3 before
+ 656710 30848 680 688238 a806e libpixman-1.so.0.33.3 after
+
+ Reviewed-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+ Tested-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+ Signed-off-by: Matt Turner <mattst88@gmail.com>
+
+commit 90e62c086766afffd289a321c7de8ea4b5cac87d
+Author: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+Date: Fri Sep 4 15:39:00 2015 +0300
+
+ vmx: implement fast path vmx_composite_over_n_8888
+
+ Running "lowlevel-blt-bench over_n_8888" on Playstation3 3.2GHz,
+ Gentoo ppc (32-bit userland) gave the following results:
+
+ before: over_n_8888 = L1: 147.47 L2: 205.86 M:121.07
+ after: over_n_8888 = L1: 287.27 L2: 261.09 M:133.48
+
+ Cairo non-trimmed benchmarks on POWER8, 3.4GHz 8 Cores:
+
+ ocitysmap 659.69 -> 611.71 : 1.08x speedup
+ xfce4-terminal-a1 2725.22 -> 2547.47 : 1.07x speedup
+
+ Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+ Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+
+commit 2876d8d3dd6a71cb9eb3ac93e5b9c18b71a452da
+Author: Ben Avison <bavison@riscosopen.org>
+Date: Fri Sep 4 03:09:20 2015 +0100
+
+ affine-bench: remove 8e margin from COVER area
+
+ Patch "Remove the 8e extra safety margin in COVER_CLIP analysis" reduced
+ the required image area for setting the COVER flags in
+ pixman.c:analyze_extent(). Do the same reduction in affine-bench.
+
+ Leaving the old calculations in place would be very confusing for anyone
+ reading the code.
+
+ Also add a comment that explains how affine-bench wants to hit the COVER
+ paths. This explains why the intricate extent calculations are copied
+ from pixman.c.
+
+ [Pekka: split patch, change comments, write commit message]
+ Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+ Reviewed-by: Ben Avison <bavison@riscosopen.org>
+
+commit 0e2e9751282b19280c92be4a80c5ae476bae0ce4
+Author: Ben Avison <bavison@riscosopen.org>
+Date: Fri Sep 4 03:09:20 2015 +0100
+
+ Remove the 8e extra safety margin in COVER_CLIP analysis
+
+ As discussed in
+ http://lists.freedesktop.org/archives/pixman/2015-August/003905.html
+
+ the 8 * pixman_fixed_e (8e) adjustment which was applied to the transformed
+ coordinates is a legacy of rounding errors which used to occur in old
+ versions of Pixman, but which no longer apply. For any affine transform,
+ you are now guaranteed to get the same result by transforming the upper
+ coordinate as though you transform the lower coordinate and add (size-1)
+ steps of the increment in source coordinate space. No projective
+ transform routines use the COVER_CLIP flags, so they cannot be affected.
+
+ Proof by Siarhei Siamashka:
+
+ Let's take a look at the following affine transformation matrix (with 16.16
+ fixed point values) and two vectors:
+
+ | a b c |
+ M = | d e f |
+ | 0 0 0x10000 |
+
+ | x_dst |
+ P = | y_dst |
+ | 0x10000 |
+
+ | 0x10000 |
+ ONE_X = | 0 |
+ | 0 |
+
+ The current matrix multiplication code does the following calculations:
+
+ | (a * x_dst + b * y_dst + 0x8000) / 0x10000 + c |
+ M * P = | (d * x_dst + e * y_dst + 0x8000) / 0x10000 + f |
+ | 0x10000 |
+
+ These calculations are not perfectly exact and we may get rounding
+ because the integer coordinates are adjusted by 0.5 (or 0x8000 in the
+ 16.16 fixed point format) before doing matrix multiplication. For
+ example, if the 'a' coefficient is an odd number and 'b' is zero,
+ then we are losing some of the least significant bits when dividing by
+ 0x10000.
+
+ So we need to strictly prove that the following expression is always
+ true even though we have to deal with rounding:
+
+ | a |
+ M * (P + ONE_X) - M * P = M * ONE_X = | d |
+ | 0 |
+
+ or
+
+ ((a * (x_dst + 0x10000) + b * y_dst + 0x8000) / 0x10000 + c)
+ -
+ ((a * x_dst + b * y_dst + 0x8000) / 0x10000 + c)
+ =
+ a
+
+ It's easy to see that this is equivalent to
+
+ a + ((a * x_dst + b * y_dst + 0x8000) / 0x10000 + c)
+ - ((a * x_dst + b * y_dst + 0x8000) / 0x10000 + c)
+ =
+ a
+
+ Which means that stepping exactly by one pixel horizontally in the
+ destination image space (advancing 'x_dst' by 0x10000) is the same as
+ changing the transformed 'x_src' coordinate in the source image space
+ exactly by 'a'. The same applies to the vertical direction too.
+ Repeating these steps, we can reach any pixel in the source image
+ space and get exactly the same fixed point coordinates as doing
+ matrix multiplications per each pixel.
+
+ By the way, the older matrix multiplication implementation, which was
+ relying on less accurate calculations with three intermediate roundings
+ "((a + 0x8000) >> 16) + ((b + 0x8000) >> 16) + ((c + 0x8000) >> 16)",
+ also has the same properties. However reverting
+ http://cgit.freedesktop.org/pixman/commit/?id=ed39992564beefe6b12f81e842caba11aff98a9c
+ and applying this "Remove the 8e extra safety margin in COVER_CLIP
+ analysis" patch makes the cover test fail. The real reason why it fails
+ is that the old pixman code was using "pixman_transform_point_3d()"
+ function
+ http://cgit.freedesktop.org/pixman/tree/pixman/pixman-matrix.c?id=pixman-0.28.2#n49
+ for getting the transformed coordinate of the top left corner pixel
+ in the image scaling code, but at the same time using a different
+ "pixman_transform_point()" function
+ http://cgit.freedesktop.org/pixman/tree/pixman/pixman-matrix.c?id=pixman-0.28.2#n82
+ in the extents calculation code for setting the cover flag. And these
+ functions did the intermediate rounding differently. That's why the 8e
+ safety margin was needed.
+
+ ** proof ends
+
+ However, for COVER_CLIP_NEAREST, the actual margins added were not 8e.
+ Because the half-way cases round down, that is, coordinate 0 hits pixel
+ index -1 while coordinate e hits pixel index 0, the extra safety margins
+ were actually 7e to the left and up, and 9e to the right and down. This
+ patch removes the 7e and 9e margins and restores the -e adjustment
+ required for NEAREST sampling in Pixman. For reference, see
+ pixman/rounding.txt.
+
+ For COVER_CLIP_BILINEAR, the margins were exactly 8e as there are no
+ additional offsets to be restored, so simply removing the 8e additions
+ is enough.
+
+ Proof:
+
+ All implementations must give the same numerical results as
+ bits_image_fetch_pixel_nearest() / bits_image_fetch_pixel_bilinear().
+
+ The former does
+ int x0 = pixman_fixed_to_int (x - pixman_fixed_e);
+ which maps directly to the new test for the nearest flag, when you consider
+ that x0 must fall in the interval [0,width).
+
+ The latter does
+ x1 = x - pixman_fixed_1 / 2;
+ x1 = pixman_fixed_to_int (x1);
+ x2 = x1 + 1;
+ When you write a COVER path, you take advantage of the assumption that
+ both x1 and x2 fall in the interval [0, width).
+
+ As samplers are allowed to fetch the pixel at x2 unconditionally, we
+ require
+ x1 >= 0
+ x2 < width
+ so
+ x - pixman_fixed_1 / 2 >= 0
+ x - pixman_fixed_1 / 2 + pixman_fixed_1 < width * pixman_fixed_1
+ so
+ pixman_fixed_to_int (x - pixman_fixed_1 / 2) >= 0
+ pixman_fixed_to_int (x + pixman_fixed_1 / 2) < width
+ which matches the source code lines for the bilinear case, once you delete
+ the lines that add the 8e margin.
+
+ Signed-off-by: Ben Avison <bavison@riscosopen.org>
+ [Pekka: adjusted commit message, left affine-bench changes for another patch]
+ [Pekka: add commit message parts from Siarhei]
+ Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+ Reviewed-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+ Reviewed-by: Ben Avison <bavison@riscosopen.org>
+
+commit 23525b4ea5bc2dd67f8f65b90d023b6580ecbc36
+Author: Ben Avison <bavison@riscosopen.org>
+Date: Tue Sep 22 12:43:25 2015 +0100
+
+ pixman-general: Tighten up calculation of temporary buffer sizes
+
+ Each of the aligns can only add a maximum of 15 bytes to the space
+ requirement. This permits some edge cases to use the stack buffer where
+ previously it would have deduced that a heap buffer was required.
+
+ Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+
+commit 8b49d4b6b460d0c9299bca4ccddd7cd00d8f8441
+Author: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+Date: Tue Sep 22 04:25:40 2015 +0300
+
+ pixman-general: Fix stack related pointer arithmetic overflow
+
+ As https://bugs.freedesktop.org/show_bug.cgi?id=92027#c6 explains,
+ the stack is allocated at the very top of the process address space
+ in some configurations (32-bit x86 systems with ASLR disabled).
+ And the careless computations done with the 'dest_buffer' pointer
+ may overflow, failing the buffer upper limit check.
+
+ The problem can be reproduced using the 'stress-test' program,
+ which segfaults when executed via setarch:
+
+ export CFLAGS="-O2 -m32" && ./autogen.sh
+ ./configure --disable-libpng --disable-gtk && make
+ setarch i686 -R test/stress-test
+
+ This patch introduces the required corrections. The extra check
+ for negative 'width' may be redundant (the invalid 'width' value
+ is not supposed to reach here), but it's better to play safe
+ when dealing with the buffers allocated on stack.
+
+ Reported-by: Ludovic Courtès <ludo@gnu.org>
+ Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+ Reviewed-by: soren.sandmann@gmail.com
+ Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+
+commit 4297e9058d252cac653723fe0b1bee559fbac3a4
+Author: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
+Date: Thu Sep 17 15:43:27 2015 +0200
+
+ test: add a check for FE_DIVBYZERO
+
+ Some architectures, such as Microblaze and Nios2, currently do not
+ implement FE_DIVBYZERO, even though they have <fenv.h> and
+ feenableexcept(). This commit adds a configure.ac check to verify
+ whether FE_DIVBYZERO is defined or not, and if not, disables the
+ problematic code in test/utils.c.
+
+ Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
+ Signed-off-by: Marek Vasut <marex@denx.de>
+ Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+ Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+
+commit 8189fad9610981d5b4dcd8f8980ff169110fb33c
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date: Sun Sep 6 11:45:20 2015 +0300
+
+ vmx: Remove unused expensive functions
+
+ Now that we replaced the expensive functions with better performing
+ alternatives, we should remove them so they will not be used again.
+
+ Running Cairo benchmark on trimmed traces gave the following results:
+
+ POWER8, 8 cores, 3.4GHz, RHEL 7.2 ppc64le.
+
+ Speedups
+ ========
+ t-firefox-scrolling 1232.30 -> 1096.55 : 1.12x
+ t-gnome-terminal-vim 613.86 -> 553.10 : 1.11x
+ t-evolution 405.54 -> 371.02 : 1.09x
+ t-firefox-talos-gfx 919.31 -> 862.27 : 1.07x
+ t-gvim 653.02 -> 616.85 : 1.06x
+ t-firefox-canvas-alpha 941.29 -> 890.42 : 1.06x
+
+ Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+ Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+ Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+
+commit 6b1b8b2b90da11bf6101a151786b2a8c9f087338
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date: Sun Jun 28 13:17:41 2015 +0300
+
+ vmx: implement fast path vmx_composite_over_n_8_8888
+
+ POWER8, 8 cores, 3.4GHz, RHEL 7.2 ppc64le.
+
+ reference memcpy speed = 25008.9MB/s (6252.2MP/s for 32bpp fills)
+
+ Before After Change
+ ---------------------------------------------
+ L1 91.32 182.84 +100.22%
+ L2 94.94 182.83 +92.57%
+ M 95.55 181.51 +89.96%
+ HT 88.96 162.09 +82.21%
+ VT 87.4 168.35 +92.62%
+ R 83.37 146.23 +75.40%
+ RT 66.4 91.5 +37.80%
+ Kops/s 683 859 +25.77%
+
+ Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+ Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+ Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+
+commit 8d8caa55a38c00351047d24322e23b201b6b29ff
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date: Sun Sep 6 11:46:15 2015 +0300
+
+ vmx: optimize vmx_composite_over_n_8888_8888_ca
+
+ This patch optimizes vmx_composite_over_n_8888_8888_ca by removing use
+ of expand_alpha_1x128, unpack/pack and in_over_2x128 in favor of
+ splat_alpha, in_over and MUL/ADD macros from pixman_combine32.h.
+
+ Running "lowlevel-blt-bench -n over_8888_8888" on POWER8, 8 cores,
+ 3.4GHz, RHEL 7.2 ppc64le gave the following results:
+
+ reference memcpy speed = 23475.4MB/s (5868.8MP/s for 32bpp fills)
+
+ Before After Change
+ --------------------------------------------
+ L1 244.97 474.05 +93.51%
+ L2 243.74 473.05 +94.08%
+ M 243.29 467.16 +92.02%
+ HT 144.03 252.79 +75.51%
+ VT 174.24 279.03 +60.14%
+ R 109.86 149.98 +36.52%
+ RT 47.96 53.18 +10.88%
+ Kops/s 524 576 +9.92%
+
+ Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+ Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+ Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+
+commit 857880f0e4d1d42a8508ac77be33556cc6f7f546
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date: Sun Sep 6 10:58:30 2015 +0300
+
+ vmx: optimize scaled_nearest_scanline_vmx_8888_8888_OVER
+
+ This patch optimizes scaled_nearest_scanline_vmx_8888_8888_OVER and all
+ the functions it calls (combine1, combine4 and
+ core_combine_over_u_pixel_vmx).
+
+ The optimization is done by removing use of expand_alpha_1x128 and
+ expand_alpha_2x128 in favor of splat_alpha and MUL/ADD macros from
+ pixman_combine32.h.
+
+ Running "lowlevel-blt-bench -n over_8888_8888" on POWER8, 8 cores,
+ 3.4GHz, RHEL 7.2 ppc64le gave the following results:
+
+ reference memcpy speed = 24847.3MB/s (6211.8MP/s for 32bpp fills)
+
+ Before After Change
+ --------------------------------------------
+ L1 182.05 210.22 +15.47%
+ L2 180.6 208.92 +15.68%
+ M 180.52 208.22 +15.34%
+ HT 130.17 178.97 +37.49%
+ VT 145.82 184.22 +26.33%
+ R 104.51 129.38 +23.80%
+ RT 48.3 61.54 +27.41%
+ Kops/s 430 504 +17.21%
+
+ v2: Check *pm is not NULL before dereferencing it in combine1()
+
+ Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+ Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+ Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
+
+commit 73e586efb3ee149f76f15d9e549bffa15d8e30ec
+Author: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+Date: Mon Sep 7 14:40:49 2015 +0300
+
+ armv6: enable over_n_8888
+
+ Enable the fast path added in the previous patch by moving the lookup
+ table entries to their proper locations.
+
+ Lowlevel-blt-bench benchmark statistics with 30 iterations, showing the
+ effect of adding this one patch on top of
+ "armv6: Add over_n_8888 fast path (disabled)", which was applied on
+ fd595692941f3d9ddea8934462bd1d18aed07c65.
+
+ Before After
+ Mean StdDev Mean StdDev Confidence Change
+ L1 12.5 0.04 45.2 0.10 100.00% +263.1%
+ L2 11.1 0.02 43.2 0.03 100.00% +289.3%
+ M 9.4 0.00 42.4 0.02 100.00% +351.7%
+ HT 8.5 0.02 25.4 0.10 100.00% +198.8%
+ VT 8.4 0.02 22.3 0.07 100.00% +167.0%
+ R 8.2 0.02 23.1 0.09 100.00% +183.6%
+ RT 5.4 0.05 11.4 0.21 100.00% +110.3%
+
+ At most 3 outliers rejected per test per set.
+
+ Iterating here means that lowlevel-blt-bench was executed 30 times, and
+ the statistics above were computed from the output.
+
+ Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+
+commit 9eb6889b15a180cc94aad8ac97189af5b3a68b96
+Author: Ben Avison <bavison@riscosopen.org>
+Date: Mon Sep 7 14:40:48 2015 +0300
+
+ armv6: Add over_n_8888 fast path (disabled)
+
+ This new fast path is initially disabled by putting the entries in the
+ lookup table after the sentinel. The compiler cannot tell the new code
+ is not used, so it cannot eliminate the code. Also the lookup table size
+ will include the new fast path. When the follow-up patch then enables
+ the new fast path, the binary layout (alignments, size, etc.) will stay
+ the same compared to the disabled case.
+
+ Keeping the binary layout identical is important for benchmarking on
+ Raspberry Pi 1. The addresses at which functions are loaded will have a
+ significant impact on benchmark results, causing unexpected performance
+ changes. Keeping all function addresses the same across the patch
+ enabling a new fast path improves the reliability of benchmarks.
+
+ Benchmark results are included in the patch enabling this fast path.
+
+ [Pekka: disabled the fast path, commit message]
+ Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+
+commit 4c71f595e3393be5b922df37d50d71dd83f4f979
+Author: Ben Avison <bavison@riscosopen.org>
+Date: Wed Sep 2 20:35:59 2015 +0100
+
+ test: Add cover-test v5
+
+ This test aims to verify both numerical correctness and the honouring of
+ array bounds for scaled plots (both nearest-neighbour and bilinear) at or
+ close to the boundary conditions for applicability of "cover" type fast paths
+ and iter fetch routines.
+
+ It has a secondary purpose: by setting the env var EXACT (to any value) it
+ will only test plots that are exactly on the boundary condition. This makes
+ it possible to ensure that "cover" routines are being used to the maximum,
+ although this requires the use of a debugger or code instrumentation to
+ verify.
+
+ Changes in v4:
+
+ Check the fence page size and skip the test if it is too large. Since
+ we need to deal with pixman_fixed_t coordinates that go beyond the
+ real image width, make the page size limit 16 kB. A 32 kB or larger
+ page size would cause an a8 image width to be 32k or more, which is no
+ longer representable in pixman_fixed_t.
+
+ Use a shorthand variable 'filter' in test_cover().
+
+ Whitespace adjustments.
+
+ Changes in v5:
+
+ Skip if fenced memory is not supported. Do you know of any such
+ platform?
+
+ Signed-off-by: Ben Avison <bavison@riscosopen.org>
+ [Pekka: changes in v4 and v5]
+ Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+ Reviewed-by: Ben Avison <bavison@riscosopen.org>
+ Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
+
+commit 812c9c9758e1503bd1725af9c6fe9ede6a467506
+Author: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+Date: Tue Sep 8 13:35:33 2015 +0300
+
+ implementation: add PIXMAN_DISABLE=wholeops
+
+ Add a new option to PIXMAN_DISABLE: "wholeops". This option disables all
+ whole-operation fast paths regardless of implementation level, except
+ the general path (general_composite_rect).
+
+ The purpose is to add a debug option that allows us to test optimized
+ iterator paths specifically. With this, it is possible to see if:
+ - fast paths mask bugs in iterators
+ - compare fast paths with iterator paths for performance
+
+ The effect was tested on x86_64 by running:
+ $ PIXMAN_DISABLE='' ./test/lowlevel-blt-bench over_8888_8888
+ $ PIXMAN_DISABLE='wholeops' ./test/lowlevel-blt-bench over_8888_8888
+
+ In the first case time is spent in sse2_composite_over_8888_8888(), and
+ in the latter in sse2_combine_over_u().
+
+ Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+ Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
+
+commit e9ef2cc4dea04792a03d604c075c344055765217
+Author: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+Date: Tue Sep 8 09:36:48 2015 +0300
+
+ utils.[ch]: add fence_get_page_size()
+
+ Add a function to get the page size used for memory fence purposes, and
+ use it everywhere where getpagesize() was used.
+
+ This offers a single point in code to override the page size, in case
+ one wants to experiment how the tests work with a higher page size than
+ what the developer's machine has.
+
+ This also offers a clean API, without adding #ifdefs, to tests for
+ checking the page size.
+
+ Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+ Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
+ Reviewed-by: Ben Avison <bavison@riscosopen.org>
+
+commit 82f8c997dfd3f60a48134107ecf38663b464bdc9
+Author: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+Date: Tue Sep 8 09:20:46 2015 +0300
+
+ utils.c: fix fallback code for fence_image_create_bits()
+
+ Used a wrong variable name, causing:
+ /home/pq/git/pixman/demos/../test/utils.c: In function ‘fence_image_create_bits’:
+ /home/pq/git/pixman/demos/../test/utils.c:562:46: error: ‘width’ undeclared (first use in this function)
+
+ Use the correct variable.
+
+ Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+ Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
+ Reviewed-by: Ben Avison <bavison@riscosopen.org>
+
+commit 07006853828a59b5e0cd7d7d058d03db4e23e6ec
+Author: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+Date: Thu May 7 17:16:05 2015 +0300
+
+ test: add fence-image-self-test
+
+ Tests that fence_malloc and fence_image_create_bits actually work: that
+ out-of-bounds and out-of-row (unused stride area) accesses trigger
+ SIGSEGV.
+
+ If fence_malloc is a dummy (FENCE_MALLOC_ACTIVE not defined), this test
+ is skipped.
+
+ Changes in v2:
+
+ - check FENCE_MALLOC_ACTIVE value, not whether it is defined
+ - test that reading bytes near the fence pages does not cause a
+ segmentation fault
+
+ Changes in v3:
+
+ - Do not print progress messages unless VERBOSE environment variable is
+ set. Avoid spamming the terminal output of 'make check' on some
+ versions of autotools.
+
+ Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+ Reviewed-by: Ben Avison <bavison@riscosopen.org>
+
+commit 13d93aa12050ce99643d56b0c730404294f46c2f
+Author: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+Date: Thu May 7 16:46:01 2015 +0300
+
+ utils.[ch]: add fence_image_create_bits ()
+
+ Useful for detecting out-of-bounds accesses in composite operations.
+
+ This will be used by follow-up patches adding new tests.
+
+ Changes in v2:
+
+ - fix style on fence_image_create_bits args
+ - add page to stride only if stride_fence
+ - add comment on the fallback definition about freeing storage
+
+ Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+ Reviewed-by: Ben Avison <bavison@riscosopen.org>
+
+commit c70ddd5c9e12d87ff461d73a6f53b00d52925cf5
+Author: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+Date: Thu May 7 14:21:30 2015 +0300
+
+ utils.[ch]: add FENCE_MALLOC_ACTIVE
+
+ Define a new token to simplify checking whether fence_malloc() actually
+ can catch out-of-bounds access.
+
+ This will be used in the future to skip tests that rely on fence_malloc
+ checking functionality.
+
+ Changes in v2:
+
+ - #define FENCE_MALLOC_ACTIVE always, but change its value to help catch
+ use of it without including utils.h
+
+ Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+ Reviewed-by: Ben Avison <bavison@riscosopen.org>
+
+commit a82e519944e5d1af41cc94a14d9ae1fe0e430e68
+Author: Ben Avison <bavison@riscosopen.org>
+Date: Thu Aug 20 13:07:48 2015 +0100
+
+ scaling-test: list more details when verbose
+
+ Add mask details to the output.
+
+ [Pekka: redo whitespace and print src,dst,mask x and y.]
+ Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+ Reviewed-by: Ben Avison <bavison@riscosopen.org>
+
+commit fd595692941f3d9ddea8934462bd1d18aed07c65
+Author: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+Date: Tue Jul 7 11:31:20 2015 +0300
+
+ lowlevel-blt-bench: make extra arguments an error
+
+ If a user gives multiple patterns or extra arguments, only the last one
+ was used as the pattern while the former were just ignored. This is a
+ user error silently converted to something possibly unexpected.
+
+ In presence of extra arguments, complain and quit.
+
+ Cc: Ben Avison <bavison@riscosopen.org>
+ Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
+
+commit 69611473c5a4e7cc2e6016d82ff4ed28e289484a
+Author: Oded Gabbay <oded.gabbay@gmail.com>
+Date: Sat Aug 1 23:01:43 2015 +0300
+
+ Post-release version bump to 0.33.3
+
+ Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
+
commit ee790044b08e3b668e6aa5d9229f46ed7295ebf0
Author: Oded Gabbay <oded.gabbay@gmail.com>
Date: Sat Aug 1 22:34:53 2015 +0300
diff --git a/debian/changelog b/debian/changelog
index 42e6d85..be437ce 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+pixman (0.33.4-1) UNRELEASED; urgency=medium
+
+ * New upstream release candidate.
+
+ -- Andreas Boll <andreas.boll.dev@gmail.com> Wed, 04 Nov 2015 10:30:37 +0100
+
pixman (0.33.2-2) sid; urgency=medium
* Run tests with VERBOSE=1.
commit fa71d08a81c9bf3f2366ee45474ff868d9e10b8e
Author: Oded Gabbay <oded.gabbay@gmail.com>
Date: Fri Oct 23 17:58:49 2015 +0300
Pre-release version bump to 0.33.4
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
diff --git a/configure.ac b/configure.ac
index b04cc69..dcacff1 100644
--- a/configure.ac
+++ b/configure.ac
@@ -54,7 +54,7 @@ AC_PREREQ([2.57])
m4_define([pixman_major], 0)
m4_define([pixman_minor], 33)
-m4_define([pixman_micro], 3)
+m4_define([pixman_micro], 4)
m4_define([pixman_version],[pixman_major.pixman_minor.pixman_micro])
commit 9728241bd098bc4260e6cd83997dfecc64adc356
Author: Andrea Canciani <ranma42@gmail.com>
Date: Tue Oct 13 13:35:59 2015 +0200
test: Fix fence-image-self-test on Mac
On MacOS X, according to the manpage of mprotect(), "When a program
violates the protections of a page, it gets a SIGBUS or SIGSEGV
signal.", but fence-image-self-test was only accepting a SIGSEGV as
notification of invalid access.
Fixes fence-image-self-test
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
diff --git a/test/fence-image-self-test.c b/test/fence-image-self-test.c
index c883038..c80b3cf 100644
--- a/test/fence-image-self-test.c
+++ b/test/fence-image-self-test.c
@@ -73,7 +73,7 @@ prinfo (const char *fmt, ...)
}
static void
-do_expect_segv (void (*fn)(void *), void *data)
+do_expect_signal (void (*fn)(void *), void *data)
{
struct sigaction sa;
@@ -82,6 +82,8 @@ do_expect_segv (void (*fn)(void *), void *data)
sa.sa_sigaction = segv_handler;
if (sigaction (SIGSEGV, &sa, NULL) == -1)
die ("sigaction failed", errno);
+ if (sigaction (SIGBUS, &sa, NULL) == -1)
+ die ("sigaction failed", errno);
(*fn)(data);
@@ -96,7 +98,7 @@ do_expect_segv (void (*fn)(void *), void *data)
* to exit with success, and return failure otherwise.
*/
static pixman_bool_t
-expect_segv (void (*fn)(void *), void *data)
+expect_signal (void (*fn)(void *), void *data)
{
pid_t pid, wp;
int status;
@@ -106,7 +108,7 @@ expect_segv (void (*fn)(void *), void *data)
die ("fork failed", errno);
if (pid == 0)
- do_expect_segv (fn, data); /* never returns */
+ do_expect_signal (fn, data); /* never returns */
wp = waitpid (pid, &status, 0);
if (wp != pid)
@@ -131,9 +133,9 @@ test_read_fault (uint8_t *p, int offset)
{
prinfo ("*(uint8_t *)(%p + %d)", p, offset);
- if (expect_segv (read_u8, p + offset))
+ if (expect_signal (read_u8, p + offset))
{
- prinfo ("\tSEGV OK\n");
+ prinfo ("\tsignal OK\n");
return TRUE;
}
diff --git a/test/utils.c b/test/utils.c
index 8657966..f8e42a5 100644
--- a/test/utils.c
+++ b/test/utils.c
@@ -471,9 +471,9 @@ fence_image_destroy (pixman_image_t *image, void *data)
* min_width is only a minimum width for the image. The width is aligned up
* for the row size to be divisible by both page size and pixel size.
*
- * If stride_fence is true, the additional page on each row will be armed
- * to cause SIGSEVG on all accesses. This should catch all accesses outside
- * the valid row pixels.
+ * If stride_fence is true, the additional page on each row will be
+ * armed to cause SIGSEGV or SIGBUS on all accesses. This should catch
+ * all accesses outside the valid row pixels.
*/
pixman_image_t *
fence_image_create_bits (pixman_format_code_t format,
commit 7de61d8d14e84623b6fa46506eb74f938287f536
Author: Matt Turner <mattst88@gmail.com>
Date: Sun Oct 11 14:44:46 2015 -0700
mmx: Use MMX2 intrinsics from xmmintrin.h directly.
We had lots of hacks to handle the inability to include xmmintrin.h
without compiling with -msse (lest SSE instructions be used in
pixman-mmx.c). Some recent version of gcc relaxed this restriction.
Change configure.ac to test that xmmintrin.h can be included and that we
can use some intrinsics from it, and remove the work-around code from
pixman-mmx.c.
Evidently allows gcc 4.9.3 to optimize better as well:
text data bss dec hex filename
657078 30848 680 688606 a81de libpixman-1.so.0.33.3 before
656710 30848 680 688238 a806e libpixman-1.so.0.33.3 after
Reviewed-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Tested-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Signed-off-by: Matt Turner <mattst88@gmail.com>
diff --git a/configure.ac b/configure.ac
index 424bfd3..b04cc69 100644
--- a/configure.ac
+++ b/configure.ac
@@ -347,21 +347,14 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
#error "Need GCC >= 3.4 for MMX intrinsics"
#endif
#include <mmintrin.h>
+#include <xmmintrin.h>
int main () {
__m64 v = _mm_cvtsi32_si64 (1);
__m64 w;
- /* Some versions of clang will choke on K */
- asm ("pshufw %2, %1, %0\n\t"
- : "=y" (w)
- : "y" (v), "K" (5)
- );
-
- /* Some versions of clang will choke on this */
- asm ("pmulhuw %1, %0\n\t"
- : "+y" (w)
- : "y" (v)
- );
+ /* Test some intrinsics from xmmintrin.h */
+ w = _mm_shuffle_pi16(v, 5);
+ w = _mm_mulhi_pu16(w, w);
return _mm_cvtsi64_si32 (v);
}]])], have_mmx_intrinsics=yes)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 05c48a4..88c3a39 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -40,6 +40,9 @@
#else
#include <mmintrin.h>
#endif
+#ifdef USE_X86_MMX
+#include <xmmintrin.h>
+#endif
#include "pixman-private.h"
#include "pixman-combine32.h"
#include "pixman-inlines.h"
@@ -59,66 +62,7 @@ _mm_empty (void)
}
#endif
-#ifdef USE_X86_MMX
-# if (defined(__SUNPRO_C) || defined(_MSC_VER) || defined(_WIN64))
-# include <xmmintrin.h>
-# else
-/* We have to compile with -msse to use xmmintrin.h, but that causes SSE
- * instructions to be generated that we don't want. Just duplicate the
- * functions we want to use. */
-extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_movemask_pi8 (__m64 __A)
-{
- int ret;
-
- asm ("pmovmskb %1, %0\n\t"
- : "=r" (ret)
- : "y" (__A)
- );
-
- return ret;
-}
-
-extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mulhi_pu16 (__m64 __A, __m64 __B)
-{
- asm ("pmulhuw %1, %0\n\t"
- : "+y" (__A)
- : "y" (__B)
- );
- return __A;
-}
-
-# ifdef __OPTIMIZE__
-extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_shuffle_pi16 (__m64 __A, int8_t const __N)
-{
- __m64 ret;
-
- asm ("pshufw %2, %1, %0\n\t"
- : "=y" (ret)
- : "y" (__A), "K" (__N)
- );
-
- return ret;
-}
-# else
-# define _mm_shuffle_pi16(A, N) \
- ({ \
- __m64 ret; \
- \
- asm ("pshufw %2, %1, %0\n\t" \
- : "=y" (ret) \
- : "y" (A), "K" ((const int8_t)N) \
- ); \
- \
- ret; \
- })
-# endif
-# endif
-#endif
-
-#ifndef _MSC_VER
+#ifndef _MM_SHUFFLE
#define _MM_SHUFFLE(fp3,fp2,fp1,fp0) \
(((fp3) << 6) | ((fp2) << 4) | ((fp1) << 2) | (fp0))
#endif
commit 90e62c086766afffd289a321c7de8ea4b5cac87d
Author: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Fri Sep 4 15:39:00 2015 +0300
vmx: implement fast path vmx_composite_over_n_8888
Running "lowlevel-blt-bench over_n_8888" on Playstation3 3.2GHz,
Gentoo ppc (32-bit userland) gave the following results:
before: over_n_8888 = L1: 147.47 L2: 205.86 M:121.07
after: over_n_8888 = L1: 287.27 L2: 261.09 M:133.48
Cairo non-trimmed benchmarks on POWER8, 3.4GHz 8 Cores:
ocitysmap 659.69 -> 611.71 : 1.08x speedup
xfce4-terminal-a1 2725.22 -> 2547.47 : 1.07x speedup
Reply to: