Bug#989582: unblock: darktable/3.4.1-4
Package: release.debian.org
Severity: normal
User: release.debian.org@packages.debian.org
Usertags: unblock
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Please unblock package darktable
[ Reason ]
This version contains a fix for #989222. This involves a crash when
exporting raws of a certain format. According to Jonas this bug is
triggered by output from megapixels which is in bullseye and used by
(at least) the Librem 5 and pinephone (with mobian).
[ Impact ]
Users of some free software friendly phones will be unable to process
their images with darktable from bullseye.
[ Tests ]
I have verified the basic functionality of darktable is still
OK. Jonas tested the DNG images in question and verified that they
exported OK now.
[ Risks ]
darktable is a leaf package. The diff is a bit large, but most of it
is deletions of SSE2 specialized code. The additions are only 7 lines
and easy to sanity check.
[ Checklist ]
[x] all changes are documented in the d/changelog
[x] I reviewed all changes and I approve them
[x] attach debdiff against the package in testing
[ Other info ]
I also attach a "reduced diff" with the deleted #ifdef __SSE__ blocks
collapsed.
unblock darktable/3.4.1-4
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCAAdFiEEkiyHYXwaY0SiY6fqA0U5G1WqFSEFAmC+o1YACgkQA0U5G1Wq
FSGEug/+NjvWDdVP6jwcU0rXEUCHpgPbqYXygkVn4TIyVeqRh1e6DJCwU3mzkNo8
DnR7siTEdXp6F9e1MpCaN9G404ptk7MZasN6Aswu5Fj37knj6YzhYnrqp6fbgurL
w1dcbNhnSSlPf6czeDtSIe0uIIR3TNbhG0ICX8D6xhTumolW0+EtPHTcG8E9y7Ib
f+wlp/0mwwdpmeYB32ObkF8v4t7g4f9Y1SWrjPI0xZ/tgYiDgY8nOW39a4Nj0HQX
HzqW0oQXMaLsjFecEv7Wuf3VTWmmBubKKANvs++Lg/EQi3pbjeVMzDa2WuZBTxUL
YHe0bW012OWOtgnfuLuKdIvots8afNYpi1jtS58e4ZT1wHxEvUW2ww09jjcrnsdP
CnKFT5Ybg3WZ7rqUQ8VsYXkgCe5CdauFAlKdWluTK2SAXn7brfvnpzpUpTzFbxRN
zOtZfwPqsCJt8l3rPoMdLIlD5IQAxkPavyc1ow3bym/IIEiuVXCSSbohRHYyUBDT
lQyM7aAVi8aawGVpbB/2MeuBsdWMPCx37etU/Jz3YMtqhC1rIi6OMVoXWFb1BAAQ
sGjgRvrSes/2bkODcC/YBE9jNKinsLXbCbhQU50ObEQqHb7yeec9DsPe7NYfvhGN
22ueQyjNT1LguYVwsNzPE1WBobrSwghdFh8MFcJwNuqJR3SnEDI=
=o+Yk
-----END PGP SIGNATURE-----
diff -Nru darktable-3.4.1/debian/changelog darktable-3.4.1/debian/changelog
--- darktable-3.4.1/debian/changelog 2021-05-20 14:07:16.000000000 -0300
+++ darktable-3.4.1/debian/changelog 2021-06-05 12:41:39.000000000 -0300
@@ -1,3 +1,11 @@
+darktable (3.4.1-4) unstable; urgency=medium
+
+ * Bug fix: "crashes with 'Floating point exception (core dumped)' after
+ loading some DNG files", thanks to Jonas Smedegaard (Closes: #989222).
+ Cherry pick upstream commit 2ff4fc58e44.
+
+ -- David Bremner <bremner@debian.org> Sat, 05 Jun 2021 12:41:39 -0300
+
darktable (3.4.1-3) unstable; urgency=medium
* Bug fix: "broken symlinks: /usr/share/darktable/js/*.js ->
diff -Nru darktable-3.4.1/debian/patches/0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch darktable-3.4.1/debian/patches/0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch
--- darktable-3.4.1/debian/patches/0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch 1969-12-31 20:00:00.000000000 -0400
+++ darktable-3.4.1/debian/patches/0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch 2021-06-05 12:41:39.000000000 -0300
@@ -0,0 +1,1001 @@
+From: Hanno Schwalm <hanno@schwalm-bremen.de>
+Date: Fri, 14 May 2021 18:20:37 +0200
+Subject: Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size (#8954)
+
+* Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size_plain
+
+Fixes #8951
+
+Although the file given in the issue is crippled we can avoid the crash.
+In `dt_iop_clip_and_zoom_mosaic_half_size` and the sse friend there is possibly a div/0
+problem that should be checked.
+
+* Fixing same dib by zero in dt_iop_clip_and_zoom_mosaic_half_size_f
+
+* Remove sse code for dt_iop_clip_and_zoom_mosaic... after testing performance
+
+checked performance non-sse vs sse specific code
+- with added local timers
+- using gcc 10.2
+- testing -t 1/4/8/16
+- intel (xeon like 9900) with fixed clock rate
+
+in
+- dt_iop_clip_and_zoom_mosaic_half_size
+- dt_iop_clip_and_zoom_mosaic_half_size_f
+- dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f
+- dt_iop_clip_and_zoom_demosaic_half_size_f
+
+with consitant results. For all functions the sse specific code was somewhat slower (~20%)
+than the vectorized compiler code. Number of omp cores didn't matter, just made the results
+more measurable because of low execution times.
+
+So i removed all the sse specific code for less code burden and better performance.
+
+* Fix sse header plus div/0
+
+At least for bayer images we absolutely want to be sure there is no div by zero as there might
+be buggy dng files.
+---
+ src/develop/imageop_math.c | 890 +--------------------------------------------
+ 1 file changed, 7 insertions(+), 883 deletions(-)
+
+diff --git a/src/develop/imageop_math.c b/src/develop/imageop_math.c
+index ef55965..0066a83 100644
+--- a/src/develop/imageop_math.c
++++ b/src/develop/imageop_math.c
+@@ -18,14 +18,8 @@
+
+ #include "develop/imageop_math.h"
+ #include <assert.h> // for assert
+-#ifdef __SSE__
+-#include <emmintrin.h> // for _mm_set_epi32, _mm_add_epi32
+-#endif
+ #include <glib.h> // for MIN, MAX, CLAMP, inline
+ #include <math.h> // for round, floorf, fmaxf
+-#ifdef __SSE__
+-#include <xmmintrin.h> // for _mm_set_ps, _mm_mul_ps, _mm_set...
+-#endif
+ #include "common/darktable.h" // for darktable, darktable_t, dt_code...
+ #include "common/imageio.h" // for FILTERS_ARE_4BAYER
+ #include "common/interpolation.h" // for dt_interpolation_new, dt_interp...
+@@ -177,7 +171,7 @@ int dt_iop_clip_and_zoom_roi_cl(int devid, cl_mem dev_out, cl_mem dev_in, const
+
+ #endif
+
+-void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint16_t *const in,
++void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in,
+ const dt_iop_roi_t *const roi_out,
+ const dt_iop_roi_t *const roi_in, const int32_t out_stride,
+ const int32_t in_stride, const uint32_t filters)
+@@ -244,224 +238,12 @@ void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint
+ num++;
+ }
+ }
+- *outc = col / num;
+- }
+- }
+-}
+-
+-#if defined(__SSE__)
+-void dt_iop_clip_and_zoom_mosaic_half_size_sse2(uint16_t *const out, const uint16_t *const in,
+- const dt_iop_roi_t *const roi_out,
+- const dt_iop_roi_t *const roi_in, const int32_t out_stride,
+- const int32_t in_stride, const uint32_t filters)
+-{
+- // adjust to pixel region and don't sample more than scale/2 nbs!
+- // pixel footprint on input buffer, radius:
+- const float px_footprint = 1.f / roi_out->scale;
+- // how many 2x2 blocks can be sampled inside that area
+- const int samples = round(px_footprint / 2);
+-
+- // move p to point to an rggb block:
+- int trggbx = 0, trggby = 0;
+- if(FC(trggby, trggbx + 1, filters) != 1) trggbx++;
+- if(FC(trggby, trggbx, filters) != 0)
+- {
+- trggbx = (trggbx + 1) & 1;
+- trggby++;
+- }
+- const int rggbx = trggbx, rggby = trggby;
+-
+-#ifdef _OPENMP
+-#pragma omp parallel for default(none) \
+- dt_omp_firstprivate(in, in_stride, out, out_stride, px_footprint, rggbx, rggby, roi_in, roi_out, samples) \
+- schedule(static)
+-#endif
+- for(int y = 0; y < roi_out->height; y++)
+- {
+- uint16_t *outc = out + out_stride * y;
+-
+- const float fy = (y + roi_out->y) * px_footprint;
+- int py = (int)fy & ~1;
+- const float dy = (fy - py) / 2;
+- py = MIN(((roi_in->height - 6) & ~1u), py) + rggby;
+-
+- const int maxj = MIN(((roi_in->height - 5) & ~1u) + rggby, py + 2 * samples);
+-
+- for(int x = 0; x < roi_out->width; x++)
+- {
+- __m128 col = _mm_setzero_ps();
+-
+- const float fx = (x + roi_out->x) * px_footprint;
+- int px = (int)fx & ~1;
+- const float dx = (fx - px) / 2;
+- px = MIN(((roi_in->width - 6) & ~1u), px) + rggbx;
+-
+- const int maxi = MIN(((roi_in->width - 5) & ~1u) + rggbx, px + 2 * samples);
+-
+- float p1, p2, p3, p4;
+- float num = 0;
+-
+- // upper left 2x2 block of sampling region
+- p1 = in[px + in_stride * py];
+- p2 = in[px + 1 + in_stride * py];
+- p3 = in[px + in_stride * (py + 1)];
+- p4 = in[px + 1 + in_stride * (py + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * (1 - dy)), _mm_set_ps(p4, p3, p2, p1)));
+-
+- // left 2x2 block border of sampling region
+- for(int j = py + 2; j <= maxj; j += 2)
+- {
+- p1 = in[px + in_stride * j];
+- p2 = in[px + 1 + in_stride * j];
+- p3 = in[px + in_stride * (j + 1)];
+- p4 = in[px + 1 + in_stride * (j + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dx), _mm_set_ps(p4, p3, p2, p1)));
+- }
+-
+- // upper 2x2 block border of sampling region
+- for(int i = px + 2; i <= maxi; i += 2)
+- {
+- p1 = in[i + in_stride * py];
+- p2 = in[i + 1 + in_stride * py];
+- p3 = in[i + in_stride * (py + 1)];
+- p4 = in[i + 1 + in_stride * (py + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dy), _mm_set_ps(p4, p3, p2, p1)));
+- }
+-
+- // 2x2 blocks in the middle of sampling region
+- for(int j = py + 2; j <= maxj; j += 2)
+- for(int i = px + 2; i <= maxi; i += 2)
+- {
+- p1 = in[i + in_stride * j];
+- p2 = in[i + 1 + in_stride * j];
+- p3 = in[i + in_stride * (j + 1)];
+- p4 = in[i + 1 + in_stride * (j + 1)];
+- col = _mm_add_ps(col, _mm_set_ps(p4, p3, p2, p1));
+- }
+-
+- if(maxi == px + 2 * samples && maxj == py + 2 * samples)
+- {
+- // right border
+- for(int j = py + 2; j <= maxj; j += 2)
+- {
+- p1 = in[maxi + 2 + in_stride * j];
+- p2 = in[maxi + 3 + in_stride * j];
+- p3 = in[maxi + 2 + in_stride * (j + 1)];
+- p4 = in[maxi + 3 + in_stride * (j + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(p4, p3, p2, p1)));
+- }
+-
+- // upper right
+- p1 = in[maxi + 2 + in_stride * py];
+- p2 = in[maxi + 3 + in_stride * py];
+- p3 = in[maxi + 2 + in_stride * (py + 1)];
+- p4 = in[maxi + 3 + in_stride * (py + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(p4, p3, p2, p1)));
+-
+- // lower border
+- for(int i = px + 2; i <= maxi; i += 2)
+- {
+- p1 = in[i + in_stride * (maxj + 2)];
+- p2 = in[i + 1 + in_stride * (maxj + 2)];
+- p3 = in[i + in_stride * (maxj + 3)];
+- p4 = in[i + 1 + in_stride * (maxj + 3)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(p4, p3, p2, p1)));
+- }
+-
+- // lower left 2x2 block
+- p1 = in[px + in_stride * (maxj + 2)];
+- p2 = in[px + 1 + in_stride * (maxj + 2)];
+- p3 = in[px + in_stride * (maxj + 3)];
+- p4 = in[px + 1 + in_stride * (maxj + 3)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(p4, p3, p2, p1)));
+-
+- // lower right 2x2 block
+- p1 = in[maxi + 2 + in_stride * (maxj + 2)];
+- p2 = in[maxi + 3 + in_stride * (maxj + 2)];
+- p3 = in[maxi + 2 + in_stride * (maxj + 3)];
+- p4 = in[maxi + 3 + in_stride * (maxj + 3)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * dy), _mm_set_ps(p4, p3, p2, p1)));
+-
+- num = (samples + 1) * (samples + 1);
+- }
+- else if(maxi == px + 2 * samples)
+- {
+- // right border
+- for(int j = py + 2; j <= maxj; j += 2)
+- {
+- p1 = in[maxi + 2 + in_stride * j];
+- p2 = in[maxi + 3 + in_stride * j];
+- p3 = in[maxi + 2 + in_stride * (j + 1)];
+- p4 = in[maxi + 3 + in_stride * (j + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(p4, p3, p2, p1)));
+- }
+-
+- // upper right
+- p1 = in[maxi + 2 + in_stride * py];
+- p2 = in[maxi + 3 + in_stride * py];
+- p3 = in[maxi + 2 + in_stride * (py + 1)];
+- p4 = in[maxi + 3 + in_stride * (py + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(p4, p3, p2, p1)));
+-
+- num = ((maxj - py) / 2 + 1 - dy) * (samples + 1);
+- }
+- else if(maxj == py + 2 * samples)
+- {
+- // lower border
+- for(int i = px + 2; i <= maxi; i += 2)
+- {
+- p1 = in[i + in_stride * (maxj + 2)];
+- p2 = in[i + 1 + in_stride * (maxj + 2)];
+- p3 = in[i + in_stride * (maxj + 3)];
+- p4 = in[i + 1 + in_stride * (maxj + 3)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(p4, p3, p2, p1)));
+- }
+-
+- // lower left 2x2 block
+- p1 = in[px + in_stride * (maxj + 2)];
+- p2 = in[px + 1 + in_stride * (maxj + 2)];
+- p3 = in[px + in_stride * (maxj + 3)];
+- p4 = in[px + 1 + in_stride * (maxj + 3)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(p4, p3, p2, p1)));
+-
+- num = ((maxi - px) / 2 + 1 - dx) * (samples + 1);
+- }
+- else
+- {
+- num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
+- }
+-
+- num = 1.0f / num;
+- col = _mm_mul_ps(col, _mm_set1_ps(num));
+-
+- float fcol[4] __attribute__((aligned(64)));
+- _mm_store_ps(fcol, col);
+-
+- const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2));
+- *outc = (uint16_t)(fcol[c]);
+- outc++;
++ if(num) *outc = col / num;
+ }
+ }
+- _mm_sfence();
+-}
+-#endif
+-
+-void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in,
+- const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
+- const int32_t out_stride, const int32_t in_stride,
+- const uint32_t filters)
+-{
+- if(1)//(darktable.codepath.OPENMP_SIMD)
+- return dt_iop_clip_and_zoom_mosaic_half_size_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters);
+-#if defined(__SSE__)
+- else if(darktable.codepath.SSE2)
+- return dt_iop_clip_and_zoom_mosaic_half_size_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
+-#endif
+- else
+- dt_unreachable_codepath();
+ }
+
+-void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float *const in,
++void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in,
+ const dt_iop_roi_t *const roi_out,
+ const dt_iop_roi_t *const roi_in, const int32_t out_stride,
+ const int32_t in_stride, const uint32_t filters)
+@@ -643,223 +425,10 @@ void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float
+ }
+
+ const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2));
+- *outc = col[c] / num;
+- outc++;
+- }
+- }
+-}
+-
+-#if defined(__SSE__)
+-void dt_iop_clip_and_zoom_mosaic_half_size_f_sse2(float *const out, const float *const in,
+- const dt_iop_roi_t *const roi_out,
+- const dt_iop_roi_t *const roi_in, const int32_t out_stride,
+- const int32_t in_stride, const uint32_t filters)
+-{
+- // adjust to pixel region and don't sample more than scale/2 nbs!
+- // pixel footprint on input buffer, radius:
+- const float px_footprint = 1.f / roi_out->scale;
+- // how many 2x2 blocks can be sampled inside that area
+- const int samples = round(px_footprint / 2);
+-
+- // move p to point to an rggb block:
+- int trggbx = 0, trggby = 0;
+- if(FC(trggby, trggbx + 1, filters) != 1) trggbx++;
+- if(FC(trggby, trggbx, filters) != 0)
+- {
+- trggbx = (trggbx + 1) & 1;
+- trggby++;
+- }
+- const int rggbx = trggbx, rggby = trggby;
+-
+-#ifdef _OPENMP
+-#pragma omp parallel for default(none) \
+- dt_omp_firstprivate(in, in_stride, out, out_stride, px_footprint, rggbx, \
+- rggby, roi_in, roi_out, samples) \
+- schedule(static)
+-#endif
+- for(int y = 0; y < roi_out->height; y++)
+- {
+- float *outc = out + out_stride * y;
+-
+- const float fy = (y + roi_out->y) * px_footprint;
+- int py = (int)fy & ~1;
+- const float dy = (fy - py) / 2;
+- py = MIN(((roi_in->height - 6) & ~1u), py) + rggby;
+-
+- const int maxj = MIN(((roi_in->height - 5) & ~1u) + rggby, py + 2 * samples);
+-
+- for(int x = 0; x < roi_out->width; x++)
+- {
+- __m128 col = _mm_setzero_ps();
+-
+- const float fx = (x + roi_out->x) * px_footprint;
+- int px = (int)fx & ~1;
+- const float dx = (fx - px) / 2;
+- px = MIN(((roi_in->width - 6) & ~1u), px) + rggbx;
+-
+- const int maxi = MIN(((roi_in->width - 5) & ~1u) + rggbx, px + 2 * samples);
+-
+- float p1, p2, p3, p4;
+- float num = 0;
+-
+- // upper left 2x2 block of sampling region
+- p1 = in[px + in_stride * py];
+- p2 = in[px + 1 + in_stride * py];
+- p3 = in[px + in_stride * (py + 1)];
+- p4 = in[px + 1 + in_stride * (py + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * (1 - dy)), _mm_set_ps(p4, p3, p2, p1)));
+-
+- // left 2x2 block border of sampling region
+- for(int j = py + 2; j <= maxj; j += 2)
+- {
+- p1 = in[px + in_stride * j];
+- p2 = in[px + 1 + in_stride * j];
+- p3 = in[px + in_stride * (j + 1)];
+- p4 = in[px + 1 + in_stride * (j + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dx), _mm_set_ps(p4, p3, p2, p1)));
+- }
+-
+- // upper 2x2 block border of sampling region
+- for(int i = px + 2; i <= maxi; i += 2)
+- {
+- p1 = in[i + in_stride * py];
+- p2 = in[i + 1 + in_stride * py];
+- p3 = in[i + in_stride * (py + 1)];
+- p4 = in[i + 1 + in_stride * (py + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dy), _mm_set_ps(p4, p3, p2, p1)));
+- }
+-
+- // 2x2 blocks in the middle of sampling region
+- for(int j = py + 2; j <= maxj; j += 2)
+- for(int i = px + 2; i <= maxi; i += 2)
+- {
+- p1 = in[i + in_stride * j];
+- p2 = in[i + 1 + in_stride * j];
+- p3 = in[i + in_stride * (j + 1)];
+- p4 = in[i + 1 + in_stride * (j + 1)];
+- col = _mm_add_ps(col, _mm_set_ps(p4, p3, p2, p1));
+- }
+-
+- if(maxi == px + 2 * samples && maxj == py + 2 * samples)
+- {
+- // right border
+- for(int j = py + 2; j <= maxj; j += 2)
+- {
+- p1 = in[maxi + 2 + in_stride * j];
+- p2 = in[maxi + 3 + in_stride * j];
+- p3 = in[maxi + 2 + in_stride * (j + 1)];
+- p4 = in[maxi + 3 + in_stride * (j + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(p4, p3, p2, p1)));
+- }
+-
+- // upper right
+- p1 = in[maxi + 2 + in_stride * py];
+- p2 = in[maxi + 3 + in_stride * py];
+- p3 = in[maxi + 2 + in_stride * (py + 1)];
+- p4 = in[maxi + 3 + in_stride * (py + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(p4, p3, p2, p1)));
+-
+- // lower border
+- for(int i = px + 2; i <= maxi; i += 2)
+- {
+- p1 = in[i + in_stride * (maxj + 2)];
+- p2 = in[i + 1 + in_stride * (maxj + 2)];
+- p3 = in[i + in_stride * (maxj + 3)];
+- p4 = in[i + 1 + in_stride * (maxj + 3)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(p4, p3, p2, p1)));
+- }
+-
+- // lower left 2x2 block
+- p1 = in[px + in_stride * (maxj + 2)];
+- p2 = in[px + 1 + in_stride * (maxj + 2)];
+- p3 = in[px + in_stride * (maxj + 3)];
+- p4 = in[px + 1 + in_stride * (maxj + 3)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(p4, p3, p2, p1)));
+-
+- // lower right 2x2 block
+- p1 = in[maxi + 2 + in_stride * (maxj + 2)];
+- p2 = in[maxi + 3 + in_stride * (maxj + 2)];
+- p3 = in[maxi + 2 + in_stride * (maxj + 3)];
+- p4 = in[maxi + 3 + in_stride * (maxj + 3)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * dy), _mm_set_ps(p4, p3, p2, p1)));
+-
+- num = (samples + 1) * (samples + 1);
+- }
+- else if(maxi == px + 2 * samples)
+- {
+- // right border
+- for(int j = py + 2; j <= maxj; j += 2)
+- {
+- p1 = in[maxi + 2 + in_stride * j];
+- p2 = in[maxi + 3 + in_stride * j];
+- p3 = in[maxi + 2 + in_stride * (j + 1)];
+- p4 = in[maxi + 3 + in_stride * (j + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(p4, p3, p2, p1)));
+- }
+-
+- // upper right
+- p1 = in[maxi + 2 + in_stride * py];
+- p2 = in[maxi + 3 + in_stride * py];
+- p3 = in[maxi + 2 + in_stride * (py + 1)];
+- p4 = in[maxi + 3 + in_stride * (py + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(p4, p3, p2, p1)));
+-
+- num = ((maxj - py) / 2 + 1 - dy) * (samples + 1);
+- }
+- else if(maxj == py + 2 * samples)
+- {
+- // lower border
+- for(int i = px + 2; i <= maxi; i += 2)
+- {
+- p1 = in[i + in_stride * (maxj + 2)];
+- p2 = in[i + 1 + in_stride * (maxj + 2)];
+- p3 = in[i + in_stride * (maxj + 3)];
+- p4 = in[i + 1 + in_stride * (maxj + 3)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(p4, p3, p2, p1)));
+- }
+-
+- // lower left 2x2 block
+- p1 = in[px + in_stride * (maxj + 2)];
+- p2 = in[px + 1 + in_stride * (maxj + 2)];
+- p3 = in[px + in_stride * (maxj + 3)];
+- p4 = in[px + 1 + in_stride * (maxj + 3)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(p4, p3, p2, p1)));
+-
+- num = ((maxi - px) / 2 + 1 - dx) * (samples + 1);
+- }
+- else
+- {
+- num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
+- }
+-
+- num = 1.0f / num;
+- col = _mm_mul_ps(col, _mm_set1_ps(num));
+-
+- float fcol[4] __attribute__((aligned(64)));
+- _mm_store_ps(fcol, col);
+-
+- const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2));
+- *outc = fcol[c];
++ if(num) *outc = col[c] / num;
+ outc++;
+ }
+ }
+- _mm_sfence();
+-}
+-#endif
+-
+-void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in,
+- const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
+- const int32_t out_stride, const int32_t in_stride,
+- const uint32_t filters)
+-{
+- if(darktable.codepath.OPENMP_SIMD)
+- return dt_iop_clip_and_zoom_mosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters);
+-#if defined(__SSE__)
+- else if(darktable.codepath.SSE2)
+- return dt_iop_clip_and_zoom_mosaic_half_size_f_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
+-#endif
+- else
+- dt_unreachable_codepath();
+ }
+
+ /**
+@@ -951,7 +520,7 @@ void dt_iop_clip_and_zoom_mosaic_third_size_xtrans_f(float *const out, const flo
+ }
+ }
+
+-void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, const float *const in,
++void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in,
+ const dt_iop_roi_t *const roi_out,
+ const dt_iop_roi_t *const roi_in,
+ const int32_t out_stride,
+@@ -1085,7 +654,7 @@ void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co
+ num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
+ }
+
+- const float pix = col / num;
++ const float pix = (num) ? col / num : 0.0f;
+ outc[0] = pix;
+ outc[1] = pix;
+ outc[2] = pix;
+@@ -1095,256 +664,7 @@ void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co
+ }
+ }
+
+-#if defined(__SSE__)
+-void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_sse2(float *out, const float *const in,
+- const dt_iop_roi_t *const roi_out,
+- const dt_iop_roi_t *const roi_in,
+- const int32_t out_stride,
+- const int32_t in_stride)
+-{
+- // adjust to pixel region and don't sample more than scale/2 nbs!
+- // pixel footprint on input buffer, radius:
+- const float px_footprint = 1.f / roi_out->scale;
+- // how many pixels can be sampled inside that area
+- const int samples = round(px_footprint);
+-
+-#ifdef _OPENMP
+-#pragma omp parallel for default(none) \
+- dt_omp_firstprivate(in, in_stride, out_stride, px_footprint, roi_in, roi_out, samples) \
+- shared(out) \
+- schedule(static)
+-#endif
+- for(int y = 0; y < roi_out->height; y++)
+- {
+- float *outc = out + 4 * (out_stride * y);
+-
+- const float fy = (y + roi_out->y) * px_footprint;
+- int py = (int)fy;
+- const float dy = fy - py;
+- py = MIN(((roi_in->height - 3)), py);
+-
+- const int maxj = MIN(((roi_in->height - 2)), py + samples);
+-
+- for(int x = 0; x < roi_out->width; x++)
+- {
+- __m128 col = _mm_setzero_ps();
+-
+- const float fx = (x + roi_out->x) * px_footprint;
+- int px = (int)fx;
+- const float dx = fx - px;
+- px = MIN(((roi_in->width - 3)), px);
+-
+- const int maxi = MIN(((roi_in->width - 2)), px + samples);
+-
+- float p;
+- float num = 0;
+-
+- // upper left pixel of sampling region
+- p = in[px + in_stride * py];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * (1 - dy)), _mm_set_ps(0.0f, p, p, p)));
+-
+- // left pixel border of sampling region
+- for(int j = py + 1; j <= maxj; j++)
+- {
+- p = in[px + in_stride * j];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dx), _mm_set_ps(0.0f, p, p, p)));
+- }
+-
+- // upper pixel border of sampling region
+- for(int i = px + 1; i <= maxi; i++)
+- {
+- p = in[i + in_stride * py];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dy), _mm_set_ps(0.0f, p, p, p)));
+- }
+-
+- // pixels in the middle of sampling region
+- for(int j = py + 1; j <= maxj; j++)
+- for(int i = px + 1; i <= maxi; i++)
+- {
+- p = in[i + in_stride * j];
+- col = _mm_add_ps(col, _mm_set_ps(0.0f, p, p, p));
+- }
+-
+- if(maxi == px + samples && maxj == py + samples)
+- {
+- // right border
+- for(int j = py + 1; j <= maxj; j++)
+- {
+- p = in[maxi + 1 + in_stride * j];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(0.0f, p, p, p)));
+- }
+-
+- // upper right
+- p = in[maxi + 1 + in_stride * py];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(0.0f, p, p, p)));
+-
+- // lower border
+- for(int i = px + 1; i <= maxi; i++)
+- {
+- p = in[i + in_stride * (maxj + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(0.0f, p, p, p)));
+- }
+-
+- // lower left pixel
+- p = in[px + in_stride * (maxj + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(0.0f, p, p, p)));
+-
+- // lower right pixel
+- p = in[maxi + 1 + in_stride * (maxj + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * dy), _mm_set_ps(0.0f, p, p, p)));
+-
+- num = (samples + 1) * (samples + 1);
+- }
+- else if(maxi == px + samples)
+- {
+- // right border
+- for(int j = py + 1; j <= maxj; j++)
+- {
+- p = in[maxi + 1 + in_stride * j];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(0.0f, p, p, p)));
+- }
+-
+- // upper right
+- p = in[maxi + 1 + in_stride * py];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(0.0f, p, p, p)));
+-
+- num = ((maxj - py) / 2 + 1 - dy) * (samples + 1);
+- }
+- else if(maxj == py + samples)
+- {
+- // lower border
+- for(int i = px + 1; i <= maxi; i++)
+- {
+- p = in[i + in_stride * (maxj + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(0.0f, p, p, p)));
+- }
+-
+- // lower left pixel
+- p = in[px + in_stride * (maxj + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(0.0f, p, p, p)));
+-
+- num = ((maxi - px) / 2 + 1 - dx) * (samples + 1);
+- }
+- else
+- {
+- num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
+- }
+-
+- num = 1.0f / num;
+- col = _mm_mul_ps(col, _mm_set_ps(0.0f, num, num, num));
+- _mm_stream_ps(outc, col);
+- outc += 4;
+- }
+- }
+- _mm_sfence();
+-}
+-#endif
+-
+-void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in,
+- const dt_iop_roi_t *const roi_out,
+- const dt_iop_roi_t *const roi_in,
+- const int32_t out_stride, const int32_t in_stride)
+-{
+- if(darktable.codepath.OPENMP_SIMD)
+- return dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(out, in, roi_out, roi_in, out_stride,
+- in_stride);
+-#if defined(__SSE__)
+- else if(darktable.codepath.SSE2)
+- return dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_sse2(out, in, roi_out, roi_in, out_stride,
+- in_stride);
+-#endif
+- else
+- dt_unreachable_codepath();
+-}
+-
+-#if 0 // gets rid of pink artifacts, but doesn't do sub-pixel sampling, so shows some staircasing artifacts.
+-void
+-dt_iop_clip_and_zoom_demosaic_half_size_f(
+- float *out,
+- const float *const in,
+- const dt_iop_roi_t *const roi_out,
+- const dt_iop_roi_t *const roi_in,
+- const int32_t out_stride,
+- const int32_t in_stride,
+- const uint32_t filters,
+- const float clip)
+-{
+- // adjust to pixel region and don't sample more than scale/2 nbs!
+- // pixel footprint on input buffer, radius:
+- const float px_footprint = 1.f/roi_out->scale;
+- // how many 2x2 blocks can be sampled inside that area
+- const int samples = round(px_footprint/2);
+-
+- // move p to point to an rggb block:
+- int trggbx = 0, trggby = 0;
+- if(FC(trggby, trggbx+1, filters) != 1) trggbx ++;
+- if(FC(trggby, trggbx, filters) != 0)
+- {
+- trggbx = (trggbx + 1)&1;
+- trggby ++;
+- }
+- const int rggbx = trggbx, rggby = trggby;
+-
+-#ifdef _OPENMP
+-#pragma omp parallel for default(none) shared(out) schedule(static)
+-#endif
+- for(int y=0; y<roi_out->height; y++)
+- {
+- float *outc = out + 4*(out_stride*y);
+-
+- const float fy = (y + roi_out->y)*px_footprint;
+- int py = (int)fy & ~1;
+- py = MIN(((roi_in->height-4) & ~1u), py) + rggby;
+-
+- int maxj = MIN(((roi_in->height-3)&~1u)+rggby, py+2*samples);
+-
+- const float fx = roi_out->x*px_footprint;
+-
+- for(int x=0; x<roi_out->width; x++)
+- {
+- __m128 col = _mm_setzero_ps();
+-
+- fx += px_footprint;
+- int px = (int)fx & ~1;
+- px = MIN(((roi_in->width -4) & ~1u), px) + rggbx;
+-
+- const int maxi = MIN(((roi_in->width -3)&~1u)+rggbx, px+2*samples);
+-
+- int num = 0;
+-
+- const int idx = px + in_stride*py;
+- const float pc = MAX(MAX(in[idx], in[idx+1]), MAX(in[idx + in_stride], in[idx+1 + in_stride]));
+-
+- // 2x2 blocks in the middle of sampling region
+- __m128 sum = _mm_setzero_ps();
+-
+- for(int j=py; j<=maxj; j+=2)
+- for(int i=px; i<=maxi; i+=2)
+- {
+- const float p1 = in[i + in_stride*j];
+- const float p2 = in[i+1 + in_stride*j];
+- const float p3 = in[i + in_stride*(j + 1)];
+- const float p4 = in[i+1 + in_stride*(j + 1)];
+-
+- if (!((pc >= clip) ^ (MAX(MAX(p1,p2),MAX(p3,p4)) >= clip)))
+- {
+- sum = _mm_add_ps(sum, _mm_set_ps(0,p4,p3+p2,p1));
+- num++;
+- }
+- }
+-
+- col = _mm_mul_ps(sum, _mm_div_ps(_mm_set_ps(0.0f,1.0f,0.5f,1.0f),_mm_set1_ps(num)));
+- _mm_stream_ps(outc, col);
+- outc += 4;
+- }
+- }
+- _mm_sfence();
+-}
+-
+-#else
+-// very fast and smooth, but doesn't handle highlights:
+-
+-void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *const in,
++void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in,
+ const dt_iop_roi_t *const roi_out,
+ const dt_iop_roi_t *const roi_in, const int32_t out_stride,
+ const int32_t in_stride, const uint32_t filters)
+@@ -1522,202 +842,6 @@ void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *co
+ }
+ }
+
+-#if defined(__SSE__)
+-void dt_iop_clip_and_zoom_demosaic_half_size_f_sse2(float *out, const float *const in,
+- const dt_iop_roi_t *const roi_out,
+- const dt_iop_roi_t *const roi_in, const int32_t out_stride,
+- const int32_t in_stride, const uint32_t filters)
+-{
+- // adjust to pixel region and don't sample more than scale/2 nbs!
+- // pixel footprint on input buffer, radius:
+- const float px_footprint = 1.f / roi_out->scale;
+- // how many 2x2 blocks can be sampled inside that area
+- const int samples = round(px_footprint / 2);
+-
+- // move p to point to an rggb block:
+- int trggbx = 0, trggby = 0;
+- if(FC(trggby, trggbx + 1, filters) != 1) trggbx++;
+- if(FC(trggby, trggbx, filters) != 0)
+- {
+- trggbx = (trggbx + 1) & 1;
+- trggby++;
+- }
+- const int rggbx = trggbx, rggby = trggby;
+-
+-#ifdef _OPENMP
+-#pragma omp parallel for default(none) \
+- dt_omp_firstprivate(in, in_stride, px_footprint, rggbx, rggby, out_stride, roi_in, roi_out, samples) \
+- shared(out) \
+- schedule(static)
+-#endif
+- for(int y = 0; y < roi_out->height; y++)
+- {
+- float *outc = out + 4 * (out_stride * y);
+-
+- const float fy = (y + roi_out->y) * px_footprint;
+- int py = (int)fy & ~1;
+- const float dy = (fy - py) / 2;
+- py = MIN(((roi_in->height - 6) & ~1u), py) + rggby;
+-
+- const int maxj = MIN(((roi_in->height - 5) & ~1u) + rggby, py + 2 * samples);
+-
+- for(int x = 0; x < roi_out->width; x++)
+- {
+- __m128 col = _mm_setzero_ps();
+-
+- const float fx = (x + roi_out->x) * px_footprint;
+- int px = (int)fx & ~1;
+- const float dx = (fx - px) / 2;
+- px = MIN(((roi_in->width - 6) & ~1u), px) + rggbx;
+-
+- const int maxi = MIN(((roi_in->width - 5) & ~1u) + rggbx, px + 2 * samples);
+-
+- float p1, p2, p4;
+- float num = 0;
+-
+- // upper left 2x2 block of sampling region
+- p1 = in[px + in_stride * py];
+- p2 = in[px + 1 + in_stride * py] + in[px + in_stride * (py + 1)];
+- p4 = in[px + 1 + in_stride * (py + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * (1 - dy)), _mm_set_ps(0.0f, p4, p2, p1)));
+-
+- // left 2x2 block border of sampling region
+- for(int j = py + 2; j <= maxj; j += 2)
+- {
+- p1 = in[px + in_stride * j];
+- p2 = in[px + 1 + in_stride * j] + in[px + in_stride * (j + 1)];
+- p4 = in[px + 1 + in_stride * (j + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dx), _mm_set_ps(0.0f, p4, p2, p1)));
+- }
+-
+- // upper 2x2 block border of sampling region
+- for(int i = px + 2; i <= maxi; i += 2)
+- {
+- p1 = in[i + in_stride * py];
+- p2 = in[i + 1 + in_stride * py] + in[i + in_stride * (py + 1)];
+- p4 = in[i + 1 + in_stride * (py + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dy), _mm_set_ps(0.0f, p4, p2, p1)));
+- }
+-
+- // 2x2 blocks in the middle of sampling region
+- for(int j = py + 2; j <= maxj; j += 2)
+- for(int i = px + 2; i <= maxi; i += 2)
+- {
+- p1 = in[i + in_stride * j];
+- p2 = in[i + 1 + in_stride * j] + in[i + in_stride * (j + 1)];
+- p4 = in[i + 1 + in_stride * (j + 1)];
+- col = _mm_add_ps(col, _mm_set_ps(0.0f, p4, p2, p1));
+- }
+-
+- if(maxi == px + 2 * samples && maxj == py + 2 * samples)
+- {
+- // right border
+- for(int j = py + 2; j <= maxj; j += 2)
+- {
+- p1 = in[maxi + 2 + in_stride * j];
+- p2 = in[maxi + 3 + in_stride * j] + in[maxi + 2 + in_stride * (j + 1)];
+- p4 = in[maxi + 3 + in_stride * (j + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(0.0f, p4, p2, p1)));
+- }
+-
+- // upper right
+- p1 = in[maxi + 2 + in_stride * py];
+- p2 = in[maxi + 3 + in_stride * py] + in[maxi + 2 + in_stride * (py + 1)];
+- p4 = in[maxi + 3 + in_stride * (py + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(0.0f, p4, p2, p1)));
+-
+- // lower border
+- for(int i = px + 2; i <= maxi; i += 2)
+- {
+- p1 = in[i + in_stride * (maxj + 2)];
+- p2 = in[i + 1 + in_stride * (maxj + 2)] + in[i + in_stride * (maxj + 3)];
+- p4 = in[i + 1 + in_stride * (maxj + 3)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(0.0f, p4, p2, p1)));
+- }
+-
+- // lower left 2x2 block
+- p1 = in[px + in_stride * (maxj + 2)];
+- p2 = in[px + 1 + in_stride * (maxj + 2)] + in[px + in_stride * (maxj + 3)];
+- p4 = in[px + 1 + in_stride * (maxj + 3)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(0.0f, p4, p2, p1)));
+-
+- // lower right 2x2 block
+- p1 = in[maxi + 2 + in_stride * (maxj + 2)];
+- p2 = in[maxi + 3 + in_stride * (maxj + 2)] + in[maxi + 2 + in_stride * (maxj + 3)];
+- p4 = in[maxi + 3 + in_stride * (maxj + 3)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * dy), _mm_set_ps(0.0f, p4, p2, p1)));
+-
+- num = (samples + 1) * (samples + 1);
+- }
+- else if(maxi == px + 2 * samples)
+- {
+- // right border
+- for(int j = py + 2; j <= maxj; j += 2)
+- {
+- p1 = in[maxi + 2 + in_stride * j];
+- p2 = in[maxi + 3 + in_stride * j] + in[maxi + 2 + in_stride * (j + 1)];
+- p4 = in[maxi + 3 + in_stride * (j + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(0.0f, p4, p2, p1)));
+- }
+-
+- // upper right
+- p1 = in[maxi + 2 + in_stride * py];
+- p2 = in[maxi + 3 + in_stride * py] + in[maxi + 2 + in_stride * (py + 1)];
+- p4 = in[maxi + 3 + in_stride * (py + 1)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(0.0f, p4, p2, p1)));
+-
+- num = ((maxj - py) / 2 + 1 - dy) * (samples + 1);
+- }
+- else if(maxj == py + 2 * samples)
+- {
+- // lower border
+- for(int i = px + 2; i <= maxi; i += 2)
+- {
+- p1 = in[i + in_stride * (maxj + 2)];
+- p2 = in[i + 1 + in_stride * (maxj + 2)] + in[i + in_stride * (maxj + 3)];
+- p4 = in[i + 1 + in_stride * (maxj + 3)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(0.0f, p4, p2, p1)));
+- }
+-
+- // lower left 2x2 block
+- p1 = in[px + in_stride * (maxj + 2)];
+- p2 = in[px + 1 + in_stride * (maxj + 2)] + in[px + in_stride * (maxj + 3)];
+- p4 = in[px + 1 + in_stride * (maxj + 3)];
+- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(0.0f, p4, p2, p1)));
+-
+- num = ((maxi - px) / 2 + 1 - dx) * (samples + 1);
+- }
+- else
+- {
+- num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
+- }
+-
+- num = 1.0f / num;
+- col = _mm_mul_ps(col, _mm_set_ps(0.0f, num, 0.5f * num, num));
+- _mm_stream_ps(outc, col);
+- outc += 4;
+- }
+- }
+- _mm_sfence();
+-}
+-#endif
+-#endif
+-
+-void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in,
+- const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
+- const int32_t out_stride, const int32_t in_stride,
+- const uint32_t filters)
+-{
+- if(darktable.codepath.OPENMP_SIMD)
+- return dt_iop_clip_and_zoom_demosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride,
+- filters);
+-#if defined(__SSE__)
+- else if(darktable.codepath.SSE2)
+- return dt_iop_clip_and_zoom_demosaic_half_size_f_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
+-#endif
+- else
+- dt_unreachable_codepath();
+-}
+
+ void dt_iop_clip_and_zoom_demosaic_third_size_xtrans_f(float *out, const float *const in,
+ const dt_iop_roi_t *const roi_out,
diff -Nru darktable-3.4.1/debian/patches/series darktable-3.4.1/debian/patches/series
--- darktable-3.4.1/debian/patches/series 2021-05-20 14:07:16.000000000 -0300
+++ darktable-3.4.1/debian/patches/series 2021-06-05 12:41:39.000000000 -0300
@@ -1 +1,2 @@
0001-add-explicit-dependency-on-generate_conf.patch
+0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch
commit f007e678d47f5662326824725cae2ab9e2455e66
Author: Hanno Schwalm <hanno@schwalm-bremen.de>
Date: Fri May 14 18:20:37 2021 +0200
Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size (#8954)
* Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size_plain
Fixes #8951
Although the file given in the issue is crippled we can avoid the crash.
In `dt_iop_clip_and_zoom_mosaic_half_size` and the sse friend there is possibly a div/0
problem that should be checked.
* Fixing same dib by zero in dt_iop_clip_and_zoom_mosaic_half_size_f
* Remove sse code for dt_iop_clip_and_zoom_mosaic... after testing performance
checked performance non-sse vs sse specific code
- with added local timers
- using gcc 10.2
- testing -t 1/4/8/16
- intel (xeon like 9900) with fixed clock rate
in
- dt_iop_clip_and_zoom_mosaic_half_size
- dt_iop_clip_and_zoom_mosaic_half_size_f
- dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f
- dt_iop_clip_and_zoom_demosaic_half_size_f
with consitant results. For all functions the sse specific code was somewhat slower (~20%)
than the vectorized compiler code. Number of omp cores didn't matter, just made the results
more measurable because of low execution times.
So i removed all the sse specific code for less code burden and better performance.
* Fix sse header plus div/0
At least for bayer images we absolutely want to be sure there is no div by zero as there might
be buggy dng files.
diff --git a/src/develop/imageop_math.c b/src/develop/imageop_math.c
index ef559652d..0066a83c9 100644
--- a/src/develop/imageop_math.c
+++ b/src/develop/imageop_math.c
@@ -18,14 +18,8 @@
#include "develop/imageop_math.h"
#include <assert.h> // for assert
-#ifdef __SSE__...
-#endif
#include <glib.h> // for MIN, MAX, CLAMP, inline
#include <math.h> // for round, floorf, fmaxf
-#ifdef __SSE__...
-#endif
#include "common/darktable.h" // for darktable, darktable_t, dt_code...
#include "common/imageio.h" // for FILTERS_ARE_4BAYER
#include "common/interpolation.h" // for dt_interpolation_new, dt_interp...
@@ -177,7 +171,7 @@ int dt_iop_clip_and_zoom_roi_cl(int devid, cl_mem dev_out, cl_mem dev_in, const
#endif
-void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint16_t *const in,
+void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in,
const dt_iop_roi_t *const roi_out,
const dt_iop_roi_t *const roi_in, const int32_t out_stride,
const int32_t in_stride, const uint32_t filters)
@@ -244,224 +238,12 @@ void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint
num++;
}
}
- *outc = col / num;
- }
- }
-}
-
-#if defined(__SSE__)...
-#endif
-
-void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in,
- const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
- const int32_t out_stride, const int32_t in_stride,
- const uint32_t filters)
-{
- if(1)//(darktable.codepath.OPENMP_SIMD)
- return dt_iop_clip_and_zoom_mosaic_half_size_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters);
-#if defined(__SSE__)
- else if(darktable.codepath.SSE2)
- return dt_iop_clip_and_zoom_mosaic_half_size_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
-#endif
- else
- dt_unreachable_codepath();
}
-void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float *const in,
+void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in,
const dt_iop_roi_t *const roi_out,
const dt_iop_roi_t *const roi_in, const int32_t out_stride,
const int32_t in_stride, const uint32_t filters)
@@ -643,223 +425,10 @@ void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float
}
const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2));
- *outc = col[c] / num;
- outc++;
- }
- }
-}
-
-#if defined(__SSE__)...
-#endif
-
-void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in,
- const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
- const int32_t out_stride, const int32_t in_stride,
- const uint32_t filters)
-{
- if(darktable.codepath.OPENMP_SIMD)
- return dt_iop_clip_and_zoom_mosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters);
-#if defined(__SSE__)
- else if(darktable.codepath.SSE2)
- return dt_iop_clip_and_zoom_mosaic_half_size_f_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
-#endif
- else
- dt_unreachable_codepath();
}
/**
@@ -951,7 +520,7 @@ void dt_iop_clip_and_zoom_mosaic_third_size_xtrans_f(float *const out, const flo
}
}
-void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, const float *const in,
+void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in,
const dt_iop_roi_t *const roi_out,
const dt_iop_roi_t *const roi_in,
const int32_t out_stride,
@@ -1085,7 +654,7 @@ void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co
num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
}
- const float pix = col / num;
+ const float pix = (num) ? col / num : 0.0f;
outc[0] = pix;
outc[1] = pix;
outc[2] = pix;
@@ -1095,256 +664,7 @@ void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co
}
}
-#if defined(__SSE__)...
-#endif
-
-void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in,
- const dt_iop_roi_t *const roi_out,
- const dt_iop_roi_t *const roi_in,
- const int32_t out_stride, const int32_t in_stride)
-{
- if(darktable.codepath.OPENMP_SIMD)
- return dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(out, in, roi_out, roi_in, out_stride,
- in_stride);
-#if defined(__SSE__)...
-#endif
- else
- dt_unreachable_codepath();
-}
-
-#if 0 // gets rid of pink artifacts, but doesn't do sub-pixel sampling, so shows some staircasing artifacts....
-#else
-// very fast and smooth, but doesn't handle highlights:
-
-void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *const in,
+void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in,
const dt_iop_roi_t *const roi_out,
const dt_iop_roi_t *const roi_in, const int32_t out_stride,
const int32_t in_stride, const uint32_t filters)
@@ -1522,202 +842,6 @@ void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *co
}
}
-#if defined(__SSE__)...
-#endif
-#endif
-
-void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in,
- const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
- const int32_t out_stride, const int32_t in_stride,
- const uint32_t filters)
-{
- if(darktable.codepath.OPENMP_SIMD)
- return dt_iop_clip_and_zoom_demosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride,
- filters);
-#if defined(__SSE__)...
-#endif
- else
- dt_unreachable_codepath();
-}
void dt_iop_clip_and_zoom_demosaic_third_size_xtrans_f(float *out, const float *const in,
const dt_iop_roi_t *const roi_out,
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<!-- Created by htmlize-1.55 in css mode. -->
<html>
<head>
<title>darktable.diff</title>
<style type="text/css">
<!--
body {
color: #93a1a1;
background-color: #002b36;
}
.diff-added {
/* diff-added */
color: #98fb98;
}
.diff-context {
}
.diff-file-header {
/* diff-file-header */
background-color: #8b7500;
font-weight: bold;
}
.diff-function {
/* diff-function */
background-color: #333333;
}
.diff-header {
/* diff-header */
background-color: #333333;
}
.diff-hunk-header {
/* diff-hunk-header */
background-color: #333333;
}
.diff-indicator-added {
/* diff-indicator-added */
color: #22aa22;
}
.diff-indicator-removed {
/* diff-indicator-removed */
color: #aa2222;
}
.diff-refine-added {
/* diff-refine-added */
background-color: #22aa22;
}
.diff-refine-removed {
/* diff-refine-removed */
background-color: #aa2222;
}
.diff-removed {
/* diff-removed */
color: #cd5555;
}
a {
color: inherit;
background-color: inherit;
font: inherit;
text-decoration: inherit;
}
a:hover {
text-decoration: underline;
}
-->
</style>
</head>
<body>
<pre>
<span class="diff-context">commit f007e678d47f5662326824725cae2ab9e2455e66
Author: Hanno Schwalm <a href="mailto:hanno%40schwalm-bremen.de"><hanno@schwalm-bremen.de></a>
Date: Fri May 14 18:20:37 2021 +0200
Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size (#8954)
* Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size_plain
Fixes #8951
Although the file given in the issue is crippled we can avoid the crash.
In `dt_iop_clip_and_zoom_mosaic_half_size` and the sse friend there is possibly a div/0
problem that should be checked.
* Fixing same dib by zero in dt_iop_clip_and_zoom_mosaic_half_size_f
* Remove sse code for dt_iop_clip_and_zoom_mosaic... after testing performance
checked performance non-sse vs sse specific code
- with added local timers
- using gcc 10.2
- testing -t 1/4/8/16
- intel (xeon like 9900) with fixed clock rate
in
- dt_iop_clip_and_zoom_mosaic_half_size
- dt_iop_clip_and_zoom_mosaic_half_size_f
- dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f
- dt_iop_clip_and_zoom_demosaic_half_size_f
with consitant results. For all functions the sse specific code was somewhat slower (~20%)
than the vectorized compiler code. Number of omp cores didn't matter, just made the results
more measurable because of low execution times.
So i removed all the sse specific code for less code burden and better performance.
* Fix sse header plus div/0
At least for bayer images we absolutely want to be sure there is no div by zero as there might
be buggy dng files.
</span>
<span class="diff-header">diff --git a/src/develop/imageop_math.c b/src/develop/imageop_math.c
index ef559652d..0066a83c9 100644
--- </span><span class="diff-header"><span class="diff-file-header">a/src/develop/imageop_math.c</span></span><span class="diff-header">
+++ </span><span class="diff-header"><span class="diff-file-header">b/src/develop/imageop_math.c</span></span><span class="diff-header">
</span><span class="diff-hunk-header">@@ -18,14 +18,8 @@</span>
<span class="diff-context">
#include "develop/imageop_math.h"
#include <assert.h> // for assert
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#ifdef __SSE__...
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-context"> #include <glib.h> // for MIN, MAX, CLAMP, inline
#include <math.h> // for round, floorf, fmaxf
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#ifdef __SSE__...
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-context"> #include "common/darktable.h" // for darktable, darktable_t, dt_code...
#include "common/imageio.h" // for FILTERS_ARE_4BAYER
#include "common/interpolation.h" // for dt_interpolation_new, dt_interp...
</span><span class="diff-hunk-header">@@ -177,7 +171,7 @@</span><span class="diff-function"> int dt_iop_clip_and_zoom_roi_cl(int devid, cl_mem dev_out, cl_mem dev_in, const</span>
<span class="diff-context">
#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_mosaic_half_size</span><span class="diff-removed"><span class="diff-refine-removed">_plain</span></span><span class="diff-removed">(uint16_t *const out, const uint16_t *const in,
</span><span class="diff-indicator-added">+</span><span class="diff-added">void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in,
</span><span class="diff-context"> const dt_iop_roi_t *const roi_out,
const dt_iop_roi_t *const roi_in, const int32_t out_stride,
const int32_t in_stride, const uint32_t filters)
</span><span class="diff-hunk-header">@@ -244,224 +238,12 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint</span>
<span class="diff-context"> num++;
}
}
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> </span><span class="diff-removed"><span class="diff-refine-removed">*outc = col / num;
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> }
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> }
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">}
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#if defined(__SSE__)...</span></span><span class="diff-removed">
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const int32_t out_stride, const int32_t in_stride,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const uint32_t filters)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">{
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> if(1)//(darktable.codepath.OPENMP_SIMD)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> return dt_iop_clip_and_zoom_mosaic_half_size_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters);
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#if defined(__SSE__)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> else if(darktable.codepath.SSE2)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> return dt_iop_clip_and_zoom_mosaic_half_size_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> else
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> dt_unreachable_codepath();
</span><span class="diff-context"> }
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_mosaic_half_size_f</span><span class="diff-removed"><span class="diff-refine-removed">_plain</span></span><span class="diff-removed">(float *const out, const float *const in,
</span><span class="diff-indicator-added">+</span><span class="diff-added">void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in,
</span><span class="diff-context"> const dt_iop_roi_t *const roi_out,
const dt_iop_roi_t *const roi_in, const int32_t out_stride,
const int32_t in_stride, const uint32_t filters)
</span><span class="diff-hunk-header">@@ -643,223 +425,10 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float</span>
<span class="diff-context"> }
const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2));
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> </span><span class="diff-removed"><span class="diff-refine-removed">*outc = col[c] / num;
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> outc++;
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> }
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> }
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">}
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#if defined(__SSE__)...</span></span><span class="diff-removed">
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const int32_t out_stride, const int32_t in_stride,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const uint32_t filters)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">{
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> if(darktable.codepath.OPENMP_SIMD)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> return dt_iop_clip_and_zoom_mosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters);
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#if defined(__SSE__)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> else if(darktable.codepath.SSE2)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> return dt_iop_clip_and_zoom_mosaic_half_size_f_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> else
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> dt_unreachable_codepath();
</span><span class="diff-context"> }
/**
</span><span class="diff-hunk-header">@@ -951,7 +520,7 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_mosaic_third_size_xtrans_f(float *const out, const flo</span>
<span class="diff-context"> }
}
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f</span><span class="diff-removed"><span class="diff-refine-removed">_plain</span></span><span class="diff-removed">(float *out, const float *const in,
</span><span class="diff-indicator-added">+</span><span class="diff-added">void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in,
</span><span class="diff-context"> const dt_iop_roi_t *const roi_out,
const dt_iop_roi_t *const roi_in,
const int32_t out_stride,
</span><span class="diff-hunk-header">@@ -1085,7 +654,7 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co</span>
<span class="diff-context"> num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
}
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const float pix = col / num;
</span><span class="diff-indicator-added">+</span><span class="diff-added"> const float pix = </span><span class="diff-added"><span class="diff-refine-added">(num) ?</span></span><span class="diff-added"> col / num </span><span class="diff-added"><span class="diff-refine-added">: 0.0f</span></span><span class="diff-added">;
</span><span class="diff-context"> outc[0] = pix;
outc[1] = pix;
outc[2] = pix;
</span><span class="diff-hunk-header">@@ -1095,256 +664,7 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co</span>
<span class="diff-context"> }
}
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"><span class="diff-refine-removed">#if defined(__SSE__)...
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#endif
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed">void dt_iop_clip_and_zoom_demosaic_</span><span class="diff-removed"><span class="diff-refine-removed">passthrough_monochrome_f(float *out, const float *const in,
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> const dt_iop_roi_t *const roi_out,
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> const dt_iop_roi_t *const roi_in,
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> const int32_t out_stride, const int32_t in_stride)
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">{
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> if(darktable.codepath.OPENMP_SIMD)
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> return dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(out, in, roi_out, roi_in, out_stride,
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> in_stride);
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#if defined(__SSE__)...
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#endif
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> else
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> dt_unreachable_codepath();
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">}
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#if 0 // gets rid of pink artifacts, but doesn't do sub-pixel sampling, so shows some staircasing artifacts....
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#else
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">// very fast and smooth, but doesn't handle highlights:
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">void dt_iop_clip_and_zoom_demosaic_half_size_f_plain</span></span><span class="diff-removed">(float *out, const float *const in,
</span><span class="diff-indicator-added">+</span><span class="diff-added">void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in,
</span><span class="diff-context"> const dt_iop_roi_t *const roi_out,
const dt_iop_roi_t *const roi_in, const int32_t out_stride,
const int32_t in_stride, const uint32_t filters)
</span><span class="diff-hunk-header">@@ -1522,202 +842,6 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *co</span>
<span class="diff-context"> }
}
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#if defined(__SSE__)...
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const int32_t out_stride, const int32_t in_stride,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const uint32_t filters)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">{
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> if(darktable.codepath.OPENMP_SIMD)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> return dt_iop_clip_and_zoom_demosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> filters);
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#if defined(__SSE__)...
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> else
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"> dt_unreachable_codepath();
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">}
</span><span class="diff-context">
void dt_iop_clip_and_zoom_demosaic_third_size_xtrans_f(float *out, const float *const in,
const dt_iop_roi_t *const roi_out,
</span></pre>
</body>
</html>
Reply to: