[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#989582: unblock: darktable/3.4.1-4



Package: release.debian.org
Severity: normal
User: release.debian.org@packages.debian.org
Usertags: unblock

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Please unblock package darktable

[ Reason ]

This version contains a fix for #989222.  This involves a crash when
exporting raws of a certain format.  According to Jonas this bug is
triggered by output from megapixels which is in bullseye and used by
(at least) the Librem 5 and pinephone (with mobian).

[ Impact ]

Users of some free software friendly phones will be unable to process
their images with darktable from bullseye.

[ Tests ]

I have verified the basic functionality of darktable is still
OK. Jonas tested the DNG images in question and verified that they
exported OK now.

[ Risks ]

darktable is a leaf package. The diff is a bit large, but most of it
is deletions of SSE2 specialized code. The additions are only 7 lines
and easy to sanity check.

[ Checklist ]
  [x] all changes are documented in the d/changelog
  [x] I reviewed all changes and I approve them
  [x] attach debdiff against the package in testing

[ Other info ]

I also attach a "reduced diff" with the deleted #ifdef __SSE__ blocks
collapsed.

unblock darktable/3.4.1-4


-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEkiyHYXwaY0SiY6fqA0U5G1WqFSEFAmC+o1YACgkQA0U5G1Wq
FSGEug/+NjvWDdVP6jwcU0rXEUCHpgPbqYXygkVn4TIyVeqRh1e6DJCwU3mzkNo8
DnR7siTEdXp6F9e1MpCaN9G404ptk7MZasN6Aswu5Fj37knj6YzhYnrqp6fbgurL
w1dcbNhnSSlPf6czeDtSIe0uIIR3TNbhG0ICX8D6xhTumolW0+EtPHTcG8E9y7Ib
f+wlp/0mwwdpmeYB32ObkF8v4t7g4f9Y1SWrjPI0xZ/tgYiDgY8nOW39a4Nj0HQX
HzqW0oQXMaLsjFecEv7Wuf3VTWmmBubKKANvs++Lg/EQi3pbjeVMzDa2WuZBTxUL
YHe0bW012OWOtgnfuLuKdIvots8afNYpi1jtS58e4ZT1wHxEvUW2ww09jjcrnsdP
CnKFT5Ybg3WZ7rqUQ8VsYXkgCe5CdauFAlKdWluTK2SAXn7brfvnpzpUpTzFbxRN
zOtZfwPqsCJt8l3rPoMdLIlD5IQAxkPavyc1ow3bym/IIEiuVXCSSbohRHYyUBDT
lQyM7aAVi8aawGVpbB/2MeuBsdWMPCx37etU/Jz3YMtqhC1rIi6OMVoXWFb1BAAQ
sGjgRvrSes/2bkODcC/YBE9jNKinsLXbCbhQU50ObEQqHb7yeec9DsPe7NYfvhGN
22ueQyjNT1LguYVwsNzPE1WBobrSwghdFh8MFcJwNuqJR3SnEDI=
=o+Yk
-----END PGP SIGNATURE-----
diff -Nru darktable-3.4.1/debian/changelog darktable-3.4.1/debian/changelog
--- darktable-3.4.1/debian/changelog	2021-05-20 14:07:16.000000000 -0300
+++ darktable-3.4.1/debian/changelog	2021-06-05 12:41:39.000000000 -0300
@@ -1,3 +1,11 @@
+darktable (3.4.1-4) unstable; urgency=medium
+
+  * Bug fix: "crashes with 'Floating point exception (core dumped)' after
+    loading some DNG files", thanks to Jonas Smedegaard (Closes: #989222).
+    Cherry pick upstream commit 2ff4fc58e44.
+
+ -- David Bremner <bremner@debian.org>  Sat, 05 Jun 2021 12:41:39 -0300
+
 darktable (3.4.1-3) unstable; urgency=medium
 
   * Bug fix: "broken symlinks: /usr/share/darktable/js/*.js -&gt;
diff -Nru darktable-3.4.1/debian/patches/0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch darktable-3.4.1/debian/patches/0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch
--- darktable-3.4.1/debian/patches/0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch	1969-12-31 20:00:00.000000000 -0400
+++ darktable-3.4.1/debian/patches/0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch	2021-06-05 12:41:39.000000000 -0300
@@ -0,0 +1,1001 @@
+From: Hanno Schwalm <hanno@schwalm-bremen.de>
+Date: Fri, 14 May 2021 18:20:37 +0200
+Subject: Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size (#8954)
+
+* Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size_plain
+
+Fixes #8951
+
+Although the file given in the issue is crippled we can avoid the crash.
+In `dt_iop_clip_and_zoom_mosaic_half_size` and the sse friend there is possibly a div/0
+problem that should be checked.
+
+* Fixing same dib by zero in dt_iop_clip_and_zoom_mosaic_half_size_f
+
+* Remove sse code for dt_iop_clip_and_zoom_mosaic... after testing performance
+
+checked performance non-sse vs sse specific code
+- with added local timers
+- using gcc 10.2
+- testing -t 1/4/8/16
+- intel (xeon like 9900) with fixed clock rate
+
+in
+- dt_iop_clip_and_zoom_mosaic_half_size
+- dt_iop_clip_and_zoom_mosaic_half_size_f
+- dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f
+- dt_iop_clip_and_zoom_demosaic_half_size_f
+
+with consitant results. For all functions the sse specific code was somewhat slower (~20%)
+than the vectorized compiler code. Number of omp cores didn't matter, just made the results
+more measurable because of low execution times.
+
+So i removed all the sse specific code for less code burden and better performance.
+
+* Fix sse header plus div/0
+
+At least for bayer images we absolutely want to be sure there is no div by zero as there might
+be buggy dng files.
+---
+ src/develop/imageop_math.c | 890 +--------------------------------------------
+ 1 file changed, 7 insertions(+), 883 deletions(-)
+
+diff --git a/src/develop/imageop_math.c b/src/develop/imageop_math.c
+index ef55965..0066a83 100644
+--- a/src/develop/imageop_math.c
++++ b/src/develop/imageop_math.c
+@@ -18,14 +18,8 @@
+ 
+ #include "develop/imageop_math.h"
+ #include <assert.h> // for assert
+-#ifdef __SSE__
+-#include <emmintrin.h> // for _mm_set_epi32, _mm_add_epi32
+-#endif
+ #include <glib.h> // for MIN, MAX, CLAMP, inline
+ #include <math.h> // for round, floorf, fmaxf
+-#ifdef __SSE__
+-#include <xmmintrin.h> // for _mm_set_ps, _mm_mul_ps, _mm_set...
+-#endif
+ #include "common/darktable.h"        // for darktable, darktable_t, dt_code...
+ #include "common/imageio.h"          // for FILTERS_ARE_4BAYER
+ #include "common/interpolation.h"    // for dt_interpolation_new, dt_interp...
+@@ -177,7 +171,7 @@ int dt_iop_clip_and_zoom_roi_cl(int devid, cl_mem dev_out, cl_mem dev_in, const
+ 
+ #endif
+ 
+-void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint16_t *const in,
++void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in,
+                                                  const dt_iop_roi_t *const roi_out,
+                                                  const dt_iop_roi_t *const roi_in, const int32_t out_stride,
+                                                  const int32_t in_stride, const uint32_t filters)
+@@ -244,224 +238,12 @@ void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint
+             num++;
+           }
+         }
+-      *outc = col / num;
+-    }
+-  }
+-}
+-
+-#if defined(__SSE__)
+-void dt_iop_clip_and_zoom_mosaic_half_size_sse2(uint16_t *const out, const uint16_t *const in,
+-                                                const dt_iop_roi_t *const roi_out,
+-                                                const dt_iop_roi_t *const roi_in, const int32_t out_stride,
+-                                                const int32_t in_stride, const uint32_t filters)
+-{
+-  // adjust to pixel region and don't sample more than scale/2 nbs!
+-  // pixel footprint on input buffer, radius:
+-  const float px_footprint = 1.f / roi_out->scale;
+-  // how many 2x2 blocks can be sampled inside that area
+-  const int samples = round(px_footprint / 2);
+-
+-  // move p to point to an rggb block:
+-  int trggbx = 0, trggby = 0;
+-  if(FC(trggby, trggbx + 1, filters) != 1) trggbx++;
+-  if(FC(trggby, trggbx, filters) != 0)
+-  {
+-    trggbx = (trggbx + 1) & 1;
+-    trggby++;
+-  }
+-  const int rggbx = trggbx, rggby = trggby;
+-
+-#ifdef _OPENMP
+-#pragma omp parallel for default(none) \
+-  dt_omp_firstprivate(in, in_stride, out, out_stride, px_footprint, rggbx, rggby, roi_in, roi_out, samples) \
+-  schedule(static)
+-#endif
+-  for(int y = 0; y < roi_out->height; y++)
+-  {
+-    uint16_t *outc = out + out_stride * y;
+-
+-    const float fy = (y + roi_out->y) * px_footprint;
+-    int py = (int)fy & ~1;
+-    const float dy = (fy - py) / 2;
+-    py = MIN(((roi_in->height - 6) & ~1u), py) + rggby;
+-
+-    const int maxj = MIN(((roi_in->height - 5) & ~1u) + rggby, py + 2 * samples);
+-
+-    for(int x = 0; x < roi_out->width; x++)
+-    {
+-      __m128 col = _mm_setzero_ps();
+-
+-      const float fx = (x + roi_out->x) * px_footprint;
+-      int px = (int)fx & ~1;
+-      const float dx = (fx - px) / 2;
+-      px = MIN(((roi_in->width - 6) & ~1u), px) + rggbx;
+-
+-      const int maxi = MIN(((roi_in->width - 5) & ~1u) + rggbx, px + 2 * samples);
+-
+-      float p1, p2, p3, p4;
+-      float num = 0;
+-
+-      // upper left 2x2 block of sampling region
+-      p1 = in[px + in_stride * py];
+-      p2 = in[px + 1 + in_stride * py];
+-      p3 = in[px + in_stride * (py + 1)];
+-      p4 = in[px + 1 + in_stride * (py + 1)];
+-      col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * (1 - dy)), _mm_set_ps(p4, p3, p2, p1)));
+-
+-      // left 2x2 block border of sampling region
+-      for(int j = py + 2; j <= maxj; j += 2)
+-      {
+-        p1 = in[px + in_stride * j];
+-        p2 = in[px + 1 + in_stride * j];
+-        p3 = in[px + in_stride * (j + 1)];
+-        p4 = in[px + 1 + in_stride * (j + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dx), _mm_set_ps(p4, p3, p2, p1)));
+-      }
+-
+-      // upper 2x2 block border of sampling region
+-      for(int i = px + 2; i <= maxi; i += 2)
+-      {
+-        p1 = in[i + in_stride * py];
+-        p2 = in[i + 1 + in_stride * py];
+-        p3 = in[i + in_stride * (py + 1)];
+-        p4 = in[i + 1 + in_stride * (py + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dy), _mm_set_ps(p4, p3, p2, p1)));
+-      }
+-
+-      // 2x2 blocks in the middle of sampling region
+-      for(int j = py + 2; j <= maxj; j += 2)
+-        for(int i = px + 2; i <= maxi; i += 2)
+-        {
+-          p1 = in[i + in_stride * j];
+-          p2 = in[i + 1 + in_stride * j];
+-          p3 = in[i + in_stride * (j + 1)];
+-          p4 = in[i + 1 + in_stride * (j + 1)];
+-          col = _mm_add_ps(col, _mm_set_ps(p4, p3, p2, p1));
+-        }
+-
+-      if(maxi == px + 2 * samples && maxj == py + 2 * samples)
+-      {
+-        // right border
+-        for(int j = py + 2; j <= maxj; j += 2)
+-        {
+-          p1 = in[maxi + 2 + in_stride * j];
+-          p2 = in[maxi + 3 + in_stride * j];
+-          p3 = in[maxi + 2 + in_stride * (j + 1)];
+-          p4 = in[maxi + 3 + in_stride * (j + 1)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(p4, p3, p2, p1)));
+-        }
+-
+-        // upper right
+-        p1 = in[maxi + 2 + in_stride * py];
+-        p2 = in[maxi + 3 + in_stride * py];
+-        p3 = in[maxi + 2 + in_stride * (py + 1)];
+-        p4 = in[maxi + 3 + in_stride * (py + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        // lower border
+-        for(int i = px + 2; i <= maxi; i += 2)
+-        {
+-          p1 = in[i + in_stride * (maxj + 2)];
+-          p2 = in[i + 1 + in_stride * (maxj + 2)];
+-          p3 = in[i + in_stride * (maxj + 3)];
+-          p4 = in[i + 1 + in_stride * (maxj + 3)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(p4, p3, p2, p1)));
+-        }
+-
+-        // lower left 2x2 block
+-        p1 = in[px + in_stride * (maxj + 2)];
+-        p2 = in[px + 1 + in_stride * (maxj + 2)];
+-        p3 = in[px + in_stride * (maxj + 3)];
+-        p4 = in[px + 1 + in_stride * (maxj + 3)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        // lower right 2x2 block
+-        p1 = in[maxi + 2 + in_stride * (maxj + 2)];
+-        p2 = in[maxi + 3 + in_stride * (maxj + 2)];
+-        p3 = in[maxi + 2 + in_stride * (maxj + 3)];
+-        p4 = in[maxi + 3 + in_stride * (maxj + 3)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * dy), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        num = (samples + 1) * (samples + 1);
+-      }
+-      else if(maxi == px + 2 * samples)
+-      {
+-        // right border
+-        for(int j = py + 2; j <= maxj; j += 2)
+-        {
+-          p1 = in[maxi + 2 + in_stride * j];
+-          p2 = in[maxi + 3 + in_stride * j];
+-          p3 = in[maxi + 2 + in_stride * (j + 1)];
+-          p4 = in[maxi + 3 + in_stride * (j + 1)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(p4, p3, p2, p1)));
+-        }
+-
+-        // upper right
+-        p1 = in[maxi + 2 + in_stride * py];
+-        p2 = in[maxi + 3 + in_stride * py];
+-        p3 = in[maxi + 2 + in_stride * (py + 1)];
+-        p4 = in[maxi + 3 + in_stride * (py + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        num = ((maxj - py) / 2 + 1 - dy) * (samples + 1);
+-      }
+-      else if(maxj == py + 2 * samples)
+-      {
+-        // lower border
+-        for(int i = px + 2; i <= maxi; i += 2)
+-        {
+-          p1 = in[i + in_stride * (maxj + 2)];
+-          p2 = in[i + 1 + in_stride * (maxj + 2)];
+-          p3 = in[i + in_stride * (maxj + 3)];
+-          p4 = in[i + 1 + in_stride * (maxj + 3)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(p4, p3, p2, p1)));
+-        }
+-
+-        // lower left 2x2 block
+-        p1 = in[px + in_stride * (maxj + 2)];
+-        p2 = in[px + 1 + in_stride * (maxj + 2)];
+-        p3 = in[px + in_stride * (maxj + 3)];
+-        p4 = in[px + 1 + in_stride * (maxj + 3)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        num = ((maxi - px) / 2 + 1 - dx) * (samples + 1);
+-      }
+-      else
+-      {
+-        num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
+-      }
+-
+-      num = 1.0f / num;
+-      col = _mm_mul_ps(col, _mm_set1_ps(num));
+-
+-      float fcol[4] __attribute__((aligned(64)));
+-      _mm_store_ps(fcol, col);
+-
+-      const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2));
+-      *outc = (uint16_t)(fcol[c]);
+-      outc++;
++      if(num) *outc = col / num;
+     }
+   }
+-  _mm_sfence();
+-}
+-#endif
+-
+-void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in,
+-                                           const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
+-                                           const int32_t out_stride, const int32_t in_stride,
+-                                           const uint32_t filters)
+-{
+-  if(1)//(darktable.codepath.OPENMP_SIMD)
+-    return dt_iop_clip_and_zoom_mosaic_half_size_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters);
+-#if defined(__SSE__)
+-  else if(darktable.codepath.SSE2)
+-    return dt_iop_clip_and_zoom_mosaic_half_size_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
+-#endif
+-  else
+-    dt_unreachable_codepath();
+ }
+ 
+-void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float *const in,
++void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in,
+                                                    const dt_iop_roi_t *const roi_out,
+                                                    const dt_iop_roi_t *const roi_in, const int32_t out_stride,
+                                                    const int32_t in_stride, const uint32_t filters)
+@@ -643,223 +425,10 @@ void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float
+       }
+ 
+       const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2));
+-      *outc = col[c] / num;
+-      outc++;
+-    }
+-  }
+-}
+-
+-#if defined(__SSE__)
+-void dt_iop_clip_and_zoom_mosaic_half_size_f_sse2(float *const out, const float *const in,
+-                                                  const dt_iop_roi_t *const roi_out,
+-                                                  const dt_iop_roi_t *const roi_in, const int32_t out_stride,
+-                                                  const int32_t in_stride, const uint32_t filters)
+-{
+-  // adjust to pixel region and don't sample more than scale/2 nbs!
+-  // pixel footprint on input buffer, radius:
+-  const float px_footprint = 1.f / roi_out->scale;
+-  // how many 2x2 blocks can be sampled inside that area
+-  const int samples = round(px_footprint / 2);
+-
+-  // move p to point to an rggb block:
+-  int trggbx = 0, trggby = 0;
+-  if(FC(trggby, trggbx + 1, filters) != 1) trggbx++;
+-  if(FC(trggby, trggbx, filters) != 0)
+-  {
+-    trggbx = (trggbx + 1) & 1;
+-    trggby++;
+-  }
+-  const int rggbx = trggbx, rggby = trggby;
+-
+-#ifdef _OPENMP
+-#pragma omp parallel for default(none) \
+-  dt_omp_firstprivate(in, in_stride, out, out_stride, px_footprint, rggbx, \
+-                      rggby, roi_in, roi_out, samples) \
+-  schedule(static)
+-#endif
+-  for(int y = 0; y < roi_out->height; y++)
+-  {
+-    float *outc = out + out_stride * y;
+-
+-    const float fy = (y + roi_out->y) * px_footprint;
+-    int py = (int)fy & ~1;
+-    const float dy = (fy - py) / 2;
+-    py = MIN(((roi_in->height - 6) & ~1u), py) + rggby;
+-
+-    const int maxj = MIN(((roi_in->height - 5) & ~1u) + rggby, py + 2 * samples);
+-
+-    for(int x = 0; x < roi_out->width; x++)
+-    {
+-      __m128 col = _mm_setzero_ps();
+-
+-      const float fx = (x + roi_out->x) * px_footprint;
+-      int px = (int)fx & ~1;
+-      const float dx = (fx - px) / 2;
+-      px = MIN(((roi_in->width - 6) & ~1u), px) + rggbx;
+-
+-      const int maxi = MIN(((roi_in->width - 5) & ~1u) + rggbx, px + 2 * samples);
+-
+-      float p1, p2, p3, p4;
+-      float num = 0;
+-
+-      // upper left 2x2 block of sampling region
+-      p1 = in[px + in_stride * py];
+-      p2 = in[px + 1 + in_stride * py];
+-      p3 = in[px + in_stride * (py + 1)];
+-      p4 = in[px + 1 + in_stride * (py + 1)];
+-      col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * (1 - dy)), _mm_set_ps(p4, p3, p2, p1)));
+-
+-      // left 2x2 block border of sampling region
+-      for(int j = py + 2; j <= maxj; j += 2)
+-      {
+-        p1 = in[px + in_stride * j];
+-        p2 = in[px + 1 + in_stride * j];
+-        p3 = in[px + in_stride * (j + 1)];
+-        p4 = in[px + 1 + in_stride * (j + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dx), _mm_set_ps(p4, p3, p2, p1)));
+-      }
+-
+-      // upper 2x2 block border of sampling region
+-      for(int i = px + 2; i <= maxi; i += 2)
+-      {
+-        p1 = in[i + in_stride * py];
+-        p2 = in[i + 1 + in_stride * py];
+-        p3 = in[i + in_stride * (py + 1)];
+-        p4 = in[i + 1 + in_stride * (py + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dy), _mm_set_ps(p4, p3, p2, p1)));
+-      }
+-
+-      // 2x2 blocks in the middle of sampling region
+-      for(int j = py + 2; j <= maxj; j += 2)
+-        for(int i = px + 2; i <= maxi; i += 2)
+-        {
+-          p1 = in[i + in_stride * j];
+-          p2 = in[i + 1 + in_stride * j];
+-          p3 = in[i + in_stride * (j + 1)];
+-          p4 = in[i + 1 + in_stride * (j + 1)];
+-          col = _mm_add_ps(col, _mm_set_ps(p4, p3, p2, p1));
+-        }
+-
+-      if(maxi == px + 2 * samples && maxj == py + 2 * samples)
+-      {
+-        // right border
+-        for(int j = py + 2; j <= maxj; j += 2)
+-        {
+-          p1 = in[maxi + 2 + in_stride * j];
+-          p2 = in[maxi + 3 + in_stride * j];
+-          p3 = in[maxi + 2 + in_stride * (j + 1)];
+-          p4 = in[maxi + 3 + in_stride * (j + 1)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(p4, p3, p2, p1)));
+-        }
+-
+-        // upper right
+-        p1 = in[maxi + 2 + in_stride * py];
+-        p2 = in[maxi + 3 + in_stride * py];
+-        p3 = in[maxi + 2 + in_stride * (py + 1)];
+-        p4 = in[maxi + 3 + in_stride * (py + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        // lower border
+-        for(int i = px + 2; i <= maxi; i += 2)
+-        {
+-          p1 = in[i + in_stride * (maxj + 2)];
+-          p2 = in[i + 1 + in_stride * (maxj + 2)];
+-          p3 = in[i + in_stride * (maxj + 3)];
+-          p4 = in[i + 1 + in_stride * (maxj + 3)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(p4, p3, p2, p1)));
+-        }
+-
+-        // lower left 2x2 block
+-        p1 = in[px + in_stride * (maxj + 2)];
+-        p2 = in[px + 1 + in_stride * (maxj + 2)];
+-        p3 = in[px + in_stride * (maxj + 3)];
+-        p4 = in[px + 1 + in_stride * (maxj + 3)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        // lower right 2x2 block
+-        p1 = in[maxi + 2 + in_stride * (maxj + 2)];
+-        p2 = in[maxi + 3 + in_stride * (maxj + 2)];
+-        p3 = in[maxi + 2 + in_stride * (maxj + 3)];
+-        p4 = in[maxi + 3 + in_stride * (maxj + 3)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * dy), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        num = (samples + 1) * (samples + 1);
+-      }
+-      else if(maxi == px + 2 * samples)
+-      {
+-        // right border
+-        for(int j = py + 2; j <= maxj; j += 2)
+-        {
+-          p1 = in[maxi + 2 + in_stride * j];
+-          p2 = in[maxi + 3 + in_stride * j];
+-          p3 = in[maxi + 2 + in_stride * (j + 1)];
+-          p4 = in[maxi + 3 + in_stride * (j + 1)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(p4, p3, p2, p1)));
+-        }
+-
+-        // upper right
+-        p1 = in[maxi + 2 + in_stride * py];
+-        p2 = in[maxi + 3 + in_stride * py];
+-        p3 = in[maxi + 2 + in_stride * (py + 1)];
+-        p4 = in[maxi + 3 + in_stride * (py + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        num = ((maxj - py) / 2 + 1 - dy) * (samples + 1);
+-      }
+-      else if(maxj == py + 2 * samples)
+-      {
+-        // lower border
+-        for(int i = px + 2; i <= maxi; i += 2)
+-        {
+-          p1 = in[i + in_stride * (maxj + 2)];
+-          p2 = in[i + 1 + in_stride * (maxj + 2)];
+-          p3 = in[i + in_stride * (maxj + 3)];
+-          p4 = in[i + 1 + in_stride * (maxj + 3)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(p4, p3, p2, p1)));
+-        }
+-
+-        // lower left 2x2 block
+-        p1 = in[px + in_stride * (maxj + 2)];
+-        p2 = in[px + 1 + in_stride * (maxj + 2)];
+-        p3 = in[px + in_stride * (maxj + 3)];
+-        p4 = in[px + 1 + in_stride * (maxj + 3)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        num = ((maxi - px) / 2 + 1 - dx) * (samples + 1);
+-      }
+-      else
+-      {
+-        num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
+-      }
+-
+-      num = 1.0f / num;
+-      col = _mm_mul_ps(col, _mm_set1_ps(num));
+-
+-      float fcol[4] __attribute__((aligned(64)));
+-      _mm_store_ps(fcol, col);
+-
+-      const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2));
+-      *outc = fcol[c];
++      if(num) *outc = col[c] / num;
+       outc++;
+     }
+   }
+-  _mm_sfence();
+-}
+-#endif
+-
+-void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in,
+-                                             const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
+-                                             const int32_t out_stride, const int32_t in_stride,
+-                                             const uint32_t filters)
+-{
+-  if(darktable.codepath.OPENMP_SIMD)
+-    return dt_iop_clip_and_zoom_mosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters);
+-#if defined(__SSE__)
+-  else if(darktable.codepath.SSE2)
+-    return dt_iop_clip_and_zoom_mosaic_half_size_f_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
+-#endif
+-  else
+-    dt_unreachable_codepath();
+ }
+ 
+ /**
+@@ -951,7 +520,7 @@ void dt_iop_clip_and_zoom_mosaic_third_size_xtrans_f(float *const out, const flo
+   }
+ }
+ 
+-void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, const float *const in,
++void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in,
+                                                                   const dt_iop_roi_t *const roi_out,
+                                                                   const dt_iop_roi_t *const roi_in,
+                                                                   const int32_t out_stride,
+@@ -1085,7 +654,7 @@ void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co
+         num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
+       }
+ 
+-      const float pix = col / num;
++      const float pix = (num) ? col / num : 0.0f;
+       outc[0] = pix;
+       outc[1] = pix;
+       outc[2] = pix;
+@@ -1095,256 +664,7 @@ void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co
+   }
+ }
+ 
+-#if defined(__SSE__)
+-void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_sse2(float *out, const float *const in,
+-                                                                 const dt_iop_roi_t *const roi_out,
+-                                                                 const dt_iop_roi_t *const roi_in,
+-                                                                 const int32_t out_stride,
+-                                                                 const int32_t in_stride)
+-{
+-  // adjust to pixel region and don't sample more than scale/2 nbs!
+-  // pixel footprint on input buffer, radius:
+-  const float px_footprint = 1.f / roi_out->scale;
+-  // how many pixels can be sampled inside that area
+-  const int samples = round(px_footprint);
+-
+-#ifdef _OPENMP
+-#pragma omp parallel for default(none) \
+-  dt_omp_firstprivate(in, in_stride, out_stride, px_footprint, roi_in, roi_out, samples) \
+-  shared(out) \
+-  schedule(static)
+-#endif
+-  for(int y = 0; y < roi_out->height; y++)
+-  {
+-    float *outc = out + 4 * (out_stride * y);
+-
+-    const float fy = (y + roi_out->y) * px_footprint;
+-    int py = (int)fy;
+-    const float dy = fy - py;
+-    py = MIN(((roi_in->height - 3)), py);
+-
+-    const int maxj = MIN(((roi_in->height - 2)), py + samples);
+-
+-    for(int x = 0; x < roi_out->width; x++)
+-    {
+-      __m128 col = _mm_setzero_ps();
+-
+-      const float fx = (x + roi_out->x) * px_footprint;
+-      int px = (int)fx;
+-      const float dx = fx - px;
+-      px = MIN(((roi_in->width - 3)), px);
+-
+-      const int maxi = MIN(((roi_in->width - 2)), px + samples);
+-
+-      float p;
+-      float num = 0;
+-
+-      // upper left pixel of sampling region
+-      p = in[px + in_stride * py];
+-      col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * (1 - dy)), _mm_set_ps(0.0f, p, p, p)));
+-
+-      // left pixel border of sampling region
+-      for(int j = py + 1; j <= maxj; j++)
+-      {
+-        p = in[px + in_stride * j];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dx), _mm_set_ps(0.0f, p, p, p)));
+-      }
+-
+-      // upper pixel border of sampling region
+-      for(int i = px + 1; i <= maxi; i++)
+-      {
+-        p = in[i + in_stride * py];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dy), _mm_set_ps(0.0f, p, p, p)));
+-      }
+-
+-      // pixels in the middle of sampling region
+-      for(int j = py + 1; j <= maxj; j++)
+-        for(int i = px + 1; i <= maxi; i++)
+-        {
+-          p = in[i + in_stride * j];
+-          col = _mm_add_ps(col, _mm_set_ps(0.0f, p, p, p));
+-        }
+-
+-      if(maxi == px + samples && maxj == py + samples)
+-      {
+-        // right border
+-        for(int j = py + 1; j <= maxj; j++)
+-        {
+-          p = in[maxi + 1 + in_stride * j];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(0.0f, p, p, p)));
+-        }
+-
+-        // upper right
+-        p = in[maxi + 1 + in_stride * py];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(0.0f, p, p, p)));
+-
+-        // lower border
+-        for(int i = px + 1; i <= maxi; i++)
+-        {
+-          p = in[i + in_stride * (maxj + 1)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(0.0f, p, p, p)));
+-        }
+-
+-        // lower left pixel
+-        p = in[px + in_stride * (maxj + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(0.0f, p, p, p)));
+-
+-        // lower right pixel
+-        p = in[maxi + 1 + in_stride * (maxj + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * dy), _mm_set_ps(0.0f, p, p, p)));
+-
+-        num = (samples + 1) * (samples + 1);
+-      }
+-      else if(maxi == px + samples)
+-      {
+-        // right border
+-        for(int j = py + 1; j <= maxj; j++)
+-        {
+-          p = in[maxi + 1 + in_stride * j];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(0.0f, p, p, p)));
+-        }
+-
+-        // upper right
+-        p = in[maxi + 1 + in_stride * py];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(0.0f, p, p, p)));
+-
+-        num = ((maxj - py) / 2 + 1 - dy) * (samples + 1);
+-      }
+-      else if(maxj == py + samples)
+-      {
+-        // lower border
+-        for(int i = px + 1; i <= maxi; i++)
+-        {
+-          p = in[i + in_stride * (maxj + 1)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(0.0f, p, p, p)));
+-        }
+-
+-        // lower left pixel
+-        p = in[px + in_stride * (maxj + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(0.0f, p, p, p)));
+-
+-        num = ((maxi - px) / 2 + 1 - dx) * (samples + 1);
+-      }
+-      else
+-      {
+-        num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
+-      }
+-
+-      num = 1.0f / num;
+-      col = _mm_mul_ps(col, _mm_set_ps(0.0f, num, num, num));
+-      _mm_stream_ps(outc, col);
+-      outc += 4;
+-    }
+-  }
+-  _mm_sfence();
+-}
+-#endif
+-
+-void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in,
+-                                                            const dt_iop_roi_t *const roi_out,
+-                                                            const dt_iop_roi_t *const roi_in,
+-                                                            const int32_t out_stride, const int32_t in_stride)
+-{
+-  if(darktable.codepath.OPENMP_SIMD)
+-    return dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(out, in, roi_out, roi_in, out_stride,
+-                                                                        in_stride);
+-#if defined(__SSE__)
+-  else if(darktable.codepath.SSE2)
+-    return dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_sse2(out, in, roi_out, roi_in, out_stride,
+-                                                                       in_stride);
+-#endif
+-  else
+-    dt_unreachable_codepath();
+-}
+-
+-#if 0 // gets rid of pink artifacts, but doesn't do sub-pixel sampling, so shows some staircasing artifacts.
+-void
+-dt_iop_clip_and_zoom_demosaic_half_size_f(
+-  float *out,
+-  const float *const in,
+-  const dt_iop_roi_t *const roi_out,
+-  const dt_iop_roi_t *const roi_in,
+-  const int32_t out_stride,
+-  const int32_t in_stride,
+-  const uint32_t filters,
+-  const float clip)
+-{
+-  // adjust to pixel region and don't sample more than scale/2 nbs!
+-  // pixel footprint on input buffer, radius:
+-  const float px_footprint = 1.f/roi_out->scale;
+-  // how many 2x2 blocks can be sampled inside that area
+-  const int samples = round(px_footprint/2);
+-
+-  // move p to point to an rggb block:
+-  int trggbx = 0, trggby = 0;
+-  if(FC(trggby, trggbx+1, filters) != 1) trggbx ++;
+-  if(FC(trggby, trggbx,   filters) != 0)
+-  {
+-    trggbx = (trggbx + 1)&1;
+-    trggby ++;
+-  }
+-  const int rggbx = trggbx, rggby = trggby;
+-
+-#ifdef _OPENMP
+-#pragma omp parallel for default(none) shared(out) schedule(static)
+-#endif
+-  for(int y=0; y<roi_out->height; y++)
+-  {
+-    float *outc = out + 4*(out_stride*y);
+-
+-    const float fy = (y + roi_out->y)*px_footprint;
+-    int py = (int)fy & ~1;
+-    py = MIN(((roi_in->height-4) & ~1u), py) + rggby;
+-
+-    int maxj = MIN(((roi_in->height-3)&~1u)+rggby, py+2*samples);
+-
+-    const float fx = roi_out->x*px_footprint;
+-
+-    for(int x=0; x<roi_out->width; x++)
+-    {
+-      __m128 col = _mm_setzero_ps();
+-
+-      fx += px_footprint;
+-      int px = (int)fx & ~1;
+-      px = MIN(((roi_in->width -4) & ~1u), px) + rggbx;
+-
+-      const int maxi = MIN(((roi_in->width -3)&~1u)+rggbx, px+2*samples);
+-
+-      int num = 0;
+-
+-      const int idx = px + in_stride*py;
+-      const float pc = MAX(MAX(in[idx], in[idx+1]), MAX(in[idx + in_stride], in[idx+1 + in_stride]));
+-
+-      // 2x2 blocks in the middle of sampling region
+-      __m128 sum = _mm_setzero_ps();
+-
+-      for(int j=py; j<=maxj; j+=2)
+-        for(int i=px; i<=maxi; i+=2)
+-        {
+-          const float p1 = in[i   + in_stride*j];
+-          const float p2 = in[i+1 + in_stride*j];
+-          const float p3 = in[i   + in_stride*(j + 1)];
+-          const float p4 = in[i+1 + in_stride*(j + 1)];
+-
+-          if (!((pc >= clip) ^ (MAX(MAX(p1,p2),MAX(p3,p4)) >= clip)))
+-          {
+-            sum = _mm_add_ps(sum, _mm_set_ps(0,p4,p3+p2,p1));
+-            num++;
+-          }
+-        }
+-
+-      col = _mm_mul_ps(sum, _mm_div_ps(_mm_set_ps(0.0f,1.0f,0.5f,1.0f),_mm_set1_ps(num)));
+-      _mm_stream_ps(outc, col);
+-      outc += 4;
+-    }
+-  }
+-  _mm_sfence();
+-}
+-
+-#else
+-// very fast and smooth, but doesn't handle highlights:
+-
+-void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *const in,
++void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in,
+                                                      const dt_iop_roi_t *const roi_out,
+                                                      const dt_iop_roi_t *const roi_in, const int32_t out_stride,
+                                                      const int32_t in_stride, const uint32_t filters)
+@@ -1522,202 +842,6 @@ void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *co
+   }
+ }
+ 
+-#if defined(__SSE__)
+-void dt_iop_clip_and_zoom_demosaic_half_size_f_sse2(float *out, const float *const in,
+-                                                    const dt_iop_roi_t *const roi_out,
+-                                                    const dt_iop_roi_t *const roi_in, const int32_t out_stride,
+-                                                    const int32_t in_stride, const uint32_t filters)
+-{
+-  // adjust to pixel region and don't sample more than scale/2 nbs!
+-  // pixel footprint on input buffer, radius:
+-  const float px_footprint = 1.f / roi_out->scale;
+-  // how many 2x2 blocks can be sampled inside that area
+-  const int samples = round(px_footprint / 2);
+-
+-  // move p to point to an rggb block:
+-  int trggbx = 0, trggby = 0;
+-  if(FC(trggby, trggbx + 1, filters) != 1) trggbx++;
+-  if(FC(trggby, trggbx, filters) != 0)
+-  {
+-    trggbx = (trggbx + 1) & 1;
+-    trggby++;
+-  }
+-  const int rggbx = trggbx, rggby = trggby;
+-
+-#ifdef _OPENMP
+-#pragma omp parallel for default(none) \
+-  dt_omp_firstprivate(in, in_stride, px_footprint, rggbx, rggby, out_stride, roi_in, roi_out, samples) \
+-  shared(out) \
+-  schedule(static)
+-#endif
+-  for(int y = 0; y < roi_out->height; y++)
+-  {
+-    float *outc = out + 4 * (out_stride * y);
+-
+-    const float fy = (y + roi_out->y) * px_footprint;
+-    int py = (int)fy & ~1;
+-    const float dy = (fy - py) / 2;
+-    py = MIN(((roi_in->height - 6) & ~1u), py) + rggby;
+-
+-    const int maxj = MIN(((roi_in->height - 5) & ~1u) + rggby, py + 2 * samples);
+-
+-    for(int x = 0; x < roi_out->width; x++)
+-    {
+-      __m128 col = _mm_setzero_ps();
+-
+-      const float fx = (x + roi_out->x) * px_footprint;
+-      int px = (int)fx & ~1;
+-      const float dx = (fx - px) / 2;
+-      px = MIN(((roi_in->width - 6) & ~1u), px) + rggbx;
+-
+-      const int maxi = MIN(((roi_in->width - 5) & ~1u) + rggbx, px + 2 * samples);
+-
+-      float p1, p2, p4;
+-      float num = 0;
+-
+-      // upper left 2x2 block of sampling region
+-      p1 = in[px + in_stride * py];
+-      p2 = in[px + 1 + in_stride * py] + in[px + in_stride * (py + 1)];
+-      p4 = in[px + 1 + in_stride * (py + 1)];
+-      col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * (1 - dy)), _mm_set_ps(0.0f, p4, p2, p1)));
+-
+-      // left 2x2 block border of sampling region
+-      for(int j = py + 2; j <= maxj; j += 2)
+-      {
+-        p1 = in[px + in_stride * j];
+-        p2 = in[px + 1 + in_stride * j] + in[px + in_stride * (j + 1)];
+-        p4 = in[px + 1 + in_stride * (j + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dx), _mm_set_ps(0.0f, p4, p2, p1)));
+-      }
+-
+-      // upper 2x2 block border of sampling region
+-      for(int i = px + 2; i <= maxi; i += 2)
+-      {
+-        p1 = in[i + in_stride * py];
+-        p2 = in[i + 1 + in_stride * py] + in[i + in_stride * (py + 1)];
+-        p4 = in[i + 1 + in_stride * (py + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dy), _mm_set_ps(0.0f, p4, p2, p1)));
+-      }
+-
+-      // 2x2 blocks in the middle of sampling region
+-      for(int j = py + 2; j <= maxj; j += 2)
+-        for(int i = px + 2; i <= maxi; i += 2)
+-        {
+-          p1 = in[i + in_stride * j];
+-          p2 = in[i + 1 + in_stride * j] + in[i + in_stride * (j + 1)];
+-          p4 = in[i + 1 + in_stride * (j + 1)];
+-          col = _mm_add_ps(col, _mm_set_ps(0.0f, p4, p2, p1));
+-        }
+-
+-      if(maxi == px + 2 * samples && maxj == py + 2 * samples)
+-      {
+-        // right border
+-        for(int j = py + 2; j <= maxj; j += 2)
+-        {
+-          p1 = in[maxi + 2 + in_stride * j];
+-          p2 = in[maxi + 3 + in_stride * j] + in[maxi + 2 + in_stride * (j + 1)];
+-          p4 = in[maxi + 3 + in_stride * (j + 1)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(0.0f, p4, p2, p1)));
+-        }
+-
+-        // upper right
+-        p1 = in[maxi + 2 + in_stride * py];
+-        p2 = in[maxi + 3 + in_stride * py] + in[maxi + 2 + in_stride * (py + 1)];
+-        p4 = in[maxi + 3 + in_stride * (py + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(0.0f, p4, p2, p1)));
+-
+-        // lower border
+-        for(int i = px + 2; i <= maxi; i += 2)
+-        {
+-          p1 = in[i + in_stride * (maxj + 2)];
+-          p2 = in[i + 1 + in_stride * (maxj + 2)] + in[i + in_stride * (maxj + 3)];
+-          p4 = in[i + 1 + in_stride * (maxj + 3)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(0.0f, p4, p2, p1)));
+-        }
+-
+-        // lower left 2x2 block
+-        p1 = in[px + in_stride * (maxj + 2)];
+-        p2 = in[px + 1 + in_stride * (maxj + 2)] + in[px + in_stride * (maxj + 3)];
+-        p4 = in[px + 1 + in_stride * (maxj + 3)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(0.0f, p4, p2, p1)));
+-
+-        // lower right 2x2 block
+-        p1 = in[maxi + 2 + in_stride * (maxj + 2)];
+-        p2 = in[maxi + 3 + in_stride * (maxj + 2)] + in[maxi + 2 + in_stride * (maxj + 3)];
+-        p4 = in[maxi + 3 + in_stride * (maxj + 3)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * dy), _mm_set_ps(0.0f, p4, p2, p1)));
+-
+-        num = (samples + 1) * (samples + 1);
+-      }
+-      else if(maxi == px + 2 * samples)
+-      {
+-        // right border
+-        for(int j = py + 2; j <= maxj; j += 2)
+-        {
+-          p1 = in[maxi + 2 + in_stride * j];
+-          p2 = in[maxi + 3 + in_stride * j] + in[maxi + 2 + in_stride * (j + 1)];
+-          p4 = in[maxi + 3 + in_stride * (j + 1)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(0.0f, p4, p2, p1)));
+-        }
+-
+-        // upper right
+-        p1 = in[maxi + 2 + in_stride * py];
+-        p2 = in[maxi + 3 + in_stride * py] + in[maxi + 2 + in_stride * (py + 1)];
+-        p4 = in[maxi + 3 + in_stride * (py + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(0.0f, p4, p2, p1)));
+-
+-        num = ((maxj - py) / 2 + 1 - dy) * (samples + 1);
+-      }
+-      else if(maxj == py + 2 * samples)
+-      {
+-        // lower border
+-        for(int i = px + 2; i <= maxi; i += 2)
+-        {
+-          p1 = in[i + in_stride * (maxj + 2)];
+-          p2 = in[i + 1 + in_stride * (maxj + 2)] + in[i + in_stride * (maxj + 3)];
+-          p4 = in[i + 1 + in_stride * (maxj + 3)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(0.0f, p4, p2, p1)));
+-        }
+-
+-        // lower left 2x2 block
+-        p1 = in[px + in_stride * (maxj + 2)];
+-        p2 = in[px + 1 + in_stride * (maxj + 2)] + in[px + in_stride * (maxj + 3)];
+-        p4 = in[px + 1 + in_stride * (maxj + 3)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(0.0f, p4, p2, p1)));
+-
+-        num = ((maxi - px) / 2 + 1 - dx) * (samples + 1);
+-      }
+-      else
+-      {
+-        num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
+-      }
+-
+-      num = 1.0f / num;
+-      col = _mm_mul_ps(col, _mm_set_ps(0.0f, num, 0.5f * num, num));
+-      _mm_stream_ps(outc, col);
+-      outc += 4;
+-    }
+-  }
+-  _mm_sfence();
+-}
+-#endif
+-#endif
+-
+-void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in,
+-                                               const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
+-                                               const int32_t out_stride, const int32_t in_stride,
+-                                               const uint32_t filters)
+-{
+-  if(darktable.codepath.OPENMP_SIMD)
+-    return dt_iop_clip_and_zoom_demosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride,
+-                                                           filters);
+-#if defined(__SSE__)
+-  else if(darktable.codepath.SSE2)
+-    return dt_iop_clip_and_zoom_demosaic_half_size_f_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
+-#endif
+-  else
+-    dt_unreachable_codepath();
+-}
+ 
+ void dt_iop_clip_and_zoom_demosaic_third_size_xtrans_f(float *out, const float *const in,
+                                                        const dt_iop_roi_t *const roi_out,
diff -Nru darktable-3.4.1/debian/patches/series darktable-3.4.1/debian/patches/series
--- darktable-3.4.1/debian/patches/series	2021-05-20 14:07:16.000000000 -0300
+++ darktable-3.4.1/debian/patches/series	2021-06-05 12:41:39.000000000 -0300
@@ -1 +1,2 @@
 0001-add-explicit-dependency-on-generate_conf.patch
+0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch
commit f007e678d47f5662326824725cae2ab9e2455e66
Author: Hanno Schwalm <hanno@schwalm-bremen.de>
Date:   Fri May 14 18:20:37 2021 +0200

    Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size (#8954)
    
    * Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size_plain
    
    Fixes #8951
    
    Although the file given in the issue is crippled we can avoid the crash.
    In `dt_iop_clip_and_zoom_mosaic_half_size` and the sse friend there is possibly a div/0
    problem that should be checked.
    
    * Fixing same dib by zero in dt_iop_clip_and_zoom_mosaic_half_size_f
    
    * Remove sse code for dt_iop_clip_and_zoom_mosaic... after testing performance
    
    checked performance non-sse vs sse specific code
    - with added local timers
    - using gcc 10.2
    - testing -t 1/4/8/16
    - intel (xeon like 9900) with fixed clock rate
    
    in
    - dt_iop_clip_and_zoom_mosaic_half_size
    - dt_iop_clip_and_zoom_mosaic_half_size_f
    - dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f
    - dt_iop_clip_and_zoom_demosaic_half_size_f
    
    with consitant results. For all functions the sse specific code was somewhat slower (~20%)
    than the vectorized compiler code. Number of omp cores didn't matter, just made the results
    more measurable because of low execution times.
    
    So i removed all the sse specific code for less code burden and better performance.
    
    * Fix sse header plus div/0
    
    At least for bayer images we absolutely want to be sure there is no div by zero as there might
    be buggy dng files.

diff --git a/src/develop/imageop_math.c b/src/develop/imageop_math.c
index ef559652d..0066a83c9 100644
--- a/src/develop/imageop_math.c
+++ b/src/develop/imageop_math.c
@@ -18,14 +18,8 @@
 
 #include "develop/imageop_math.h"
 #include <assert.h> // for assert
-#ifdef __SSE__...
-#endif
 #include <glib.h> // for MIN, MAX, CLAMP, inline
 #include <math.h> // for round, floorf, fmaxf
-#ifdef __SSE__...
-#endif
 #include "common/darktable.h"        // for darktable, darktable_t, dt_code...
 #include "common/imageio.h"          // for FILTERS_ARE_4BAYER
 #include "common/interpolation.h"    // for dt_interpolation_new, dt_interp...
@@ -177,7 +171,7 @@ int dt_iop_clip_and_zoom_roi_cl(int devid, cl_mem dev_out, cl_mem dev_in, const
 
 #endif
 
-void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint16_t *const in,
+void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in,
                                                  const dt_iop_roi_t *const roi_out,
                                                  const dt_iop_roi_t *const roi_in, const int32_t out_stride,
                                                  const int32_t in_stride, const uint32_t filters)
@@ -244,224 +238,12 @@ void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint
             num++;
           }
         }
-      *outc = col / num;
-    }
-  }
-}
-
-#if defined(__SSE__)...
-#endif
-
-void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in,
-                                           const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
-                                           const int32_t out_stride, const int32_t in_stride,
-                                           const uint32_t filters)
-{
-  if(1)//(darktable.codepath.OPENMP_SIMD)
-    return dt_iop_clip_and_zoom_mosaic_half_size_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters);
-#if defined(__SSE__)
-  else if(darktable.codepath.SSE2)
-    return dt_iop_clip_and_zoom_mosaic_half_size_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
-#endif
-  else
-    dt_unreachable_codepath();
 }
 
-void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float *const in,
+void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in,
                                                    const dt_iop_roi_t *const roi_out,
                                                    const dt_iop_roi_t *const roi_in, const int32_t out_stride,
                                                    const int32_t in_stride, const uint32_t filters)
@@ -643,223 +425,10 @@ void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float
       }
 
       const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2));
-      *outc = col[c] / num;
-      outc++;
-    }
-  }
-}
-
-#if defined(__SSE__)...
-#endif
-
-void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in,
-                                             const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
-                                             const int32_t out_stride, const int32_t in_stride,
-                                             const uint32_t filters)
-{
-  if(darktable.codepath.OPENMP_SIMD)
-    return dt_iop_clip_and_zoom_mosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters);
-#if defined(__SSE__)
-  else if(darktable.codepath.SSE2)
-    return dt_iop_clip_and_zoom_mosaic_half_size_f_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
-#endif
-  else
-    dt_unreachable_codepath();
 }
 
 /**
@@ -951,7 +520,7 @@ void dt_iop_clip_and_zoom_mosaic_third_size_xtrans_f(float *const out, const flo
   }
 }
 
-void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, const float *const in,
+void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in,
                                                                   const dt_iop_roi_t *const roi_out,
                                                                   const dt_iop_roi_t *const roi_in,
                                                                   const int32_t out_stride,
@@ -1085,7 +654,7 @@ void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co
         num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
       }
 
-      const float pix = col / num;
+      const float pix = (num) ? col / num : 0.0f;
       outc[0] = pix;
       outc[1] = pix;
       outc[2] = pix;
@@ -1095,256 +664,7 @@ void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co
   }
 }
 
-#if defined(__SSE__)...
-#endif
-
-void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in,
-                                                            const dt_iop_roi_t *const roi_out,
-                                                            const dt_iop_roi_t *const roi_in,
-                                                            const int32_t out_stride, const int32_t in_stride)
-{
-  if(darktable.codepath.OPENMP_SIMD)
-    return dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(out, in, roi_out, roi_in, out_stride,
-                                                                        in_stride);
-#if defined(__SSE__)...
-#endif
-  else
-    dt_unreachable_codepath();
-}
-
-#if 0 // gets rid of pink artifacts, but doesn't do sub-pixel sampling, so shows some staircasing artifacts....
-#else
-// very fast and smooth, but doesn't handle highlights:
-
-void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *const in,
+void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in,
                                                      const dt_iop_roi_t *const roi_out,
                                                      const dt_iop_roi_t *const roi_in, const int32_t out_stride,
                                                      const int32_t in_stride, const uint32_t filters)
@@ -1522,202 +842,6 @@ void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *co
   }
 }
 
-#if defined(__SSE__)...
-#endif
-#endif
-
-void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in,
-                                               const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
-                                               const int32_t out_stride, const int32_t in_stride,
-                                               const uint32_t filters)
-{
-  if(darktable.codepath.OPENMP_SIMD)
-    return dt_iop_clip_and_zoom_demosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride,
-                                                           filters);
-#if defined(__SSE__)...
-#endif
-  else
-    dt_unreachable_codepath();
-}
     
     void dt_iop_clip_and_zoom_demosaic_third_size_xtrans_f(float *out, const float *const in,
                                                            const dt_iop_roi_t *const roi_out,
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<!-- Created by htmlize-1.55 in css mode. -->
<html>
  <head>
    <title>darktable.diff</title>
    <style type="text/css">
    <!--
      body {
        color: #93a1a1;
        background-color: #002b36;
      }
      .diff-added {
        /* diff-added */
        color: #98fb98;
      }
      .diff-context {
      }
      .diff-file-header {
        /* diff-file-header */
        background-color: #8b7500;
        font-weight: bold;
      }
      .diff-function {
        /* diff-function */
        background-color: #333333;
      }
      .diff-header {
        /* diff-header */
        background-color: #333333;
      }
      .diff-hunk-header {
        /* diff-hunk-header */
        background-color: #333333;
      }
      .diff-indicator-added {
        /* diff-indicator-added */
        color: #22aa22;
      }
      .diff-indicator-removed {
        /* diff-indicator-removed */
        color: #aa2222;
      }
      .diff-refine-added {
        /* diff-refine-added */
        background-color: #22aa22;
      }
      .diff-refine-removed {
        /* diff-refine-removed */
        background-color: #aa2222;
      }
      .diff-removed {
        /* diff-removed */
        color: #cd5555;
      }

      a {
        color: inherit;
        background-color: inherit;
        font: inherit;
        text-decoration: inherit;
      }
      a:hover {
        text-decoration: underline;
      }
    -->
    </style>
  </head>
  <body>
    <pre>
<span class="diff-context">commit f007e678d47f5662326824725cae2ab9e2455e66
Author: Hanno Schwalm <a href="mailto:hanno%40schwalm-bremen.de";>&lt;hanno@schwalm-bremen.de&gt;</a>
Date:   Fri May 14 18:20:37 2021 +0200

    Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size (#8954)
    
    * Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size_plain
    
    Fixes #8951
    
    Although the file given in the issue is crippled we can avoid the crash.
    In `dt_iop_clip_and_zoom_mosaic_half_size` and the sse friend there is possibly a div/0
    problem that should be checked.
    
    * Fixing same dib by zero in dt_iop_clip_and_zoom_mosaic_half_size_f
    
    * Remove sse code for dt_iop_clip_and_zoom_mosaic... after testing performance
    
    checked performance non-sse vs sse specific code
    - with added local timers
    - using gcc 10.2
    - testing -t 1/4/8/16
    - intel (xeon like 9900) with fixed clock rate
    
    in
    - dt_iop_clip_and_zoom_mosaic_half_size
    - dt_iop_clip_and_zoom_mosaic_half_size_f
    - dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f
    - dt_iop_clip_and_zoom_demosaic_half_size_f
    
    with consitant results. For all functions the sse specific code was somewhat slower (~20%)
    than the vectorized compiler code. Number of omp cores didn't matter, just made the results
    more measurable because of low execution times.
    
    So i removed all the sse specific code for less code burden and better performance.
    
    * Fix sse header plus div/0
    
    At least for bayer images we absolutely want to be sure there is no div by zero as there might
    be buggy dng files.
</span>
<span class="diff-header">diff --git a/src/develop/imageop_math.c b/src/develop/imageop_math.c
index ef559652d..0066a83c9 100644
--- </span><span class="diff-header"><span class="diff-file-header">a/src/develop/imageop_math.c</span></span><span class="diff-header">
+++ </span><span class="diff-header"><span class="diff-file-header">b/src/develop/imageop_math.c</span></span><span class="diff-header">
</span><span class="diff-hunk-header">@@ -18,14 +18,8 @@</span>
<span class="diff-context"> 
 #include "develop/imageop_math.h"
 #include &lt;assert.h&gt; // for assert
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#ifdef __SSE__...
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-context"> #include &lt;glib.h&gt; // for MIN, MAX, CLAMP, inline
 #include &lt;math.h&gt; // for round, floorf, fmaxf
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#ifdef __SSE__...
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-context"> #include "common/darktable.h"        // for darktable, darktable_t, dt_code...
 #include "common/imageio.h"          // for FILTERS_ARE_4BAYER
 #include "common/interpolation.h"    // for dt_interpolation_new, dt_interp...
</span><span class="diff-hunk-header">@@ -177,7 +171,7 @@</span><span class="diff-function"> int dt_iop_clip_and_zoom_roi_cl(int devid, cl_mem dev_out, cl_mem dev_in, const</span>
<span class="diff-context"> 
 #endif
 
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_mosaic_half_size</span><span class="diff-removed"><span class="diff-refine-removed">_plain</span></span><span class="diff-removed">(uint16_t *const out, const uint16_t *const in,
</span><span class="diff-indicator-added">+</span><span class="diff-added">void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in,
</span><span class="diff-context">                                                  const dt_iop_roi_t *const roi_out,
                                                  const dt_iop_roi_t *const roi_in, const int32_t out_stride,
                                                  const int32_t in_stride, const uint32_t filters)
</span><span class="diff-hunk-header">@@ -244,224 +238,12 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint</span>
<span class="diff-context">             num++;
           }
         }
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">      </span><span class="diff-removed"><span class="diff-refine-removed">*outc = col / num;
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">    }
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">  }
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">}
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#if defined(__SSE__)...</span></span><span class="diff-removed">
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                           const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                           const int32_t out_stride, const int32_t in_stride,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                           const uint32_t filters)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">{
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">  if(1)//(darktable.codepath.OPENMP_SIMD)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">    return dt_iop_clip_and_zoom_mosaic_half_size_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters);
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#if defined(__SSE__)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">  else if(darktable.codepath.SSE2)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">    return dt_iop_clip_and_zoom_mosaic_half_size_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">  else
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">    dt_unreachable_codepath();
</span><span class="diff-context"> }
 
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_mosaic_half_size_f</span><span class="diff-removed"><span class="diff-refine-removed">_plain</span></span><span class="diff-removed">(float *const out, const float *const in,
</span><span class="diff-indicator-added">+</span><span class="diff-added">void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in,
</span><span class="diff-context">                                                    const dt_iop_roi_t *const roi_out,
                                                    const dt_iop_roi_t *const roi_in, const int32_t out_stride,
                                                    const int32_t in_stride, const uint32_t filters)
</span><span class="diff-hunk-header">@@ -643,223 +425,10 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float</span>
<span class="diff-context">       }
 
       const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2));
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">      </span><span class="diff-removed"><span class="diff-refine-removed">*outc = col[c] / num;
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">      outc++;
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">    }
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">  }
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">}
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#if defined(__SSE__)...</span></span><span class="diff-removed">
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                             const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                             const int32_t out_stride, const int32_t in_stride,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                             const uint32_t filters)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">{
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">  if(darktable.codepath.OPENMP_SIMD)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">    return dt_iop_clip_and_zoom_mosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters);
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#if defined(__SSE__)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">  else if(darktable.codepath.SSE2)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">    return dt_iop_clip_and_zoom_mosaic_half_size_f_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">  else
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">    dt_unreachable_codepath();
</span><span class="diff-context"> }
 
 /**
</span><span class="diff-hunk-header">@@ -951,7 +520,7 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_mosaic_third_size_xtrans_f(float *const out, const flo</span>
<span class="diff-context">   }
 }
 
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f</span><span class="diff-removed"><span class="diff-refine-removed">_plain</span></span><span class="diff-removed">(float *out, const float *const in,
</span><span class="diff-indicator-added">+</span><span class="diff-added">void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in,
</span><span class="diff-context">                                                                   const dt_iop_roi_t *const roi_out,
                                                                   const dt_iop_roi_t *const roi_in,
                                                                   const int32_t out_stride,
</span><span class="diff-hunk-header">@@ -1085,7 +654,7 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co</span>
<span class="diff-context">         num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
       }
 
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">      const float pix = col / num;
</span><span class="diff-indicator-added">+</span><span class="diff-added">      const float pix = </span><span class="diff-added"><span class="diff-refine-added">(num) ?</span></span><span class="diff-added"> col / num </span><span class="diff-added"><span class="diff-refine-added">: 0.0f</span></span><span class="diff-added">;
</span><span class="diff-context">       outc[0] = pix;
       outc[1] = pix;
       outc[2] = pix;
</span><span class="diff-hunk-header">@@ -1095,256 +664,7 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co</span>
<span class="diff-context">   }
 }
 
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"><span class="diff-refine-removed">#if defined(__SSE__)...
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#endif
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed">void dt_iop_clip_and_zoom_demosaic_</span><span class="diff-removed"><span class="diff-refine-removed">passthrough_monochrome_f(float *out, const float *const in,
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">                                                            const dt_iop_roi_t *const roi_out,
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">                                                            const dt_iop_roi_t *const roi_in,
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">                                                            const int32_t out_stride, const int32_t in_stride)
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">{
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">  if(darktable.codepath.OPENMP_SIMD)
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">    return dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(out, in, roi_out, roi_in, out_stride,
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">                                                                        in_stride);
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#if defined(__SSE__)...
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#endif
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">  else
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">    dt_unreachable_codepath();
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">}
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#if 0 // gets rid of pink artifacts, but doesn't do sub-pixel sampling, so shows some staircasing artifacts....
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#else
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">// very fast and smooth, but doesn't handle highlights:
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">void dt_iop_clip_and_zoom_demosaic_half_size_f_plain</span></span><span class="diff-removed">(float *out, const float *const in,
</span><span class="diff-indicator-added">+</span><span class="diff-added">void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in,
</span><span class="diff-context">                                                      const dt_iop_roi_t *const roi_out,
                                                      const dt_iop_roi_t *const roi_in, const int32_t out_stride,
                                                      const int32_t in_stride, const uint32_t filters)
</span><span class="diff-hunk-header">@@ -1522,202 +842,6 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *co</span>
<span class="diff-context">   }
 }
 
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#if defined(__SSE__)...
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                               const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                               const int32_t out_stride, const int32_t in_stride,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                               const uint32_t filters)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">{
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">  if(darktable.codepath.OPENMP_SIMD)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">    return dt_iop_clip_and_zoom_demosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                                           filters);
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#if defined(__SSE__)...
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">  else
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">    dt_unreachable_codepath();
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">}
</span><span class="diff-context"> 
 void dt_iop_clip_and_zoom_demosaic_third_size_xtrans_f(float *out, const float *const in,
                                                        const dt_iop_roi_t *const roi_out,
</span></pre>
  </body>
</html>

Reply to: