[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#989582: marked as done (unblock: darktable/3.4.1-4)



Your message dated Tue, 08 Jun 2021 21:21:55 +0000
with message-id <E1lqjAZ-0000MM-F7@respighi.debian.org>
and subject line unblock darktable
has caused the Debian Bug report #989582,
regarding unblock: darktable/3.4.1-4
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
989582: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989582
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: release.debian.org
Severity: normal
User: release.debian.org@packages.debian.org
Usertags: unblock

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Please unblock package darktable

[ Reason ]

This version contains a fix for #989222.  This involves a crash when
exporting raws of a certain format.  According to Jonas this bug is
triggered by output from megapixels which is in bullseye and used by
(at least) the Librem 5 and pinephone (with mobian).

[ Impact ]

Users of some free software friendly phones will be unable to process
their images with darktable from bullseye.

[ Tests ]

I have verified the basic functionality of darktable is still
OK. Jonas tested the DNG images in question and verified that they
exported OK now.

[ Risks ]

darktable is a leaf package. The diff is a bit large, but most of it
is deletions of SSE2 specialized code. The additions are only 7 lines
and easy to sanity check.

[ Checklist ]
  [x] all changes are documented in the d/changelog
  [x] I reviewed all changes and I approve them
  [x] attach debdiff against the package in testing

[ Other info ]

I also attach a "reduced diff" with the deleted #ifdef __SSE__ blocks
collapsed.

unblock darktable/3.4.1-4


-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEkiyHYXwaY0SiY6fqA0U5G1WqFSEFAmC+o1YACgkQA0U5G1Wq
FSGEug/+NjvWDdVP6jwcU0rXEUCHpgPbqYXygkVn4TIyVeqRh1e6DJCwU3mzkNo8
DnR7siTEdXp6F9e1MpCaN9G404ptk7MZasN6Aswu5Fj37knj6YzhYnrqp6fbgurL
w1dcbNhnSSlPf6czeDtSIe0uIIR3TNbhG0ICX8D6xhTumolW0+EtPHTcG8E9y7Ib
f+wlp/0mwwdpmeYB32ObkF8v4t7g4f9Y1SWrjPI0xZ/tgYiDgY8nOW39a4Nj0HQX
HzqW0oQXMaLsjFecEv7Wuf3VTWmmBubKKANvs++Lg/EQi3pbjeVMzDa2WuZBTxUL
YHe0bW012OWOtgnfuLuKdIvots8afNYpi1jtS58e4ZT1wHxEvUW2ww09jjcrnsdP
CnKFT5Ybg3WZ7rqUQ8VsYXkgCe5CdauFAlKdWluTK2SAXn7brfvnpzpUpTzFbxRN
zOtZfwPqsCJt8l3rPoMdLIlD5IQAxkPavyc1ow3bym/IIEiuVXCSSbohRHYyUBDT
lQyM7aAVi8aawGVpbB/2MeuBsdWMPCx37etU/Jz3YMtqhC1rIi6OMVoXWFb1BAAQ
sGjgRvrSes/2bkODcC/YBE9jNKinsLXbCbhQU50ObEQqHb7yeec9DsPe7NYfvhGN
22ueQyjNT1LguYVwsNzPE1WBobrSwghdFh8MFcJwNuqJR3SnEDI=
=o+Yk
-----END PGP SIGNATURE-----
diff -Nru darktable-3.4.1/debian/changelog darktable-3.4.1/debian/changelog
--- darktable-3.4.1/debian/changelog	2021-05-20 14:07:16.000000000 -0300
+++ darktable-3.4.1/debian/changelog	2021-06-05 12:41:39.000000000 -0300
@@ -1,3 +1,11 @@
+darktable (3.4.1-4) unstable; urgency=medium
+
+  * Bug fix: "crashes with 'Floating point exception (core dumped)' after
+    loading some DNG files", thanks to Jonas Smedegaard (Closes: #989222).
+    Cherry pick upstream commit 2ff4fc58e44.
+
+ -- David Bremner <bremner@debian.org>  Sat, 05 Jun 2021 12:41:39 -0300
+
 darktable (3.4.1-3) unstable; urgency=medium
 
   * Bug fix: "broken symlinks: /usr/share/darktable/js/*.js -&gt;
diff -Nru darktable-3.4.1/debian/patches/0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch darktable-3.4.1/debian/patches/0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch
--- darktable-3.4.1/debian/patches/0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch	1969-12-31 20:00:00.000000000 -0400
+++ darktable-3.4.1/debian/patches/0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch	2021-06-05 12:41:39.000000000 -0300
@@ -0,0 +1,1001 @@
+From: Hanno Schwalm <hanno@schwalm-bremen.de>
+Date: Fri, 14 May 2021 18:20:37 +0200
+Subject: Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size (#8954)
+
+* Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size_plain
+
+Fixes #8951
+
+Although the file given in the issue is crippled we can avoid the crash.
+In `dt_iop_clip_and_zoom_mosaic_half_size` and the sse friend there is possibly a div/0
+problem that should be checked.
+
+* Fixing same dib by zero in dt_iop_clip_and_zoom_mosaic_half_size_f
+
+* Remove sse code for dt_iop_clip_and_zoom_mosaic... after testing performance
+
+checked performance non-sse vs sse specific code
+- with added local timers
+- using gcc 10.2
+- testing -t 1/4/8/16
+- intel (xeon like 9900) with fixed clock rate
+
+in
+- dt_iop_clip_and_zoom_mosaic_half_size
+- dt_iop_clip_and_zoom_mosaic_half_size_f
+- dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f
+- dt_iop_clip_and_zoom_demosaic_half_size_f
+
+with consitant results. For all functions the sse specific code was somewhat slower (~20%)
+than the vectorized compiler code. Number of omp cores didn't matter, just made the results
+more measurable because of low execution times.
+
+So i removed all the sse specific code for less code burden and better performance.
+
+* Fix sse header plus div/0
+
+At least for bayer images we absolutely want to be sure there is no div by zero as there might
+be buggy dng files.
+---
+ src/develop/imageop_math.c | 890 +--------------------------------------------
+ 1 file changed, 7 insertions(+), 883 deletions(-)
+
+diff --git a/src/develop/imageop_math.c b/src/develop/imageop_math.c
+index ef55965..0066a83 100644
+--- a/src/develop/imageop_math.c
++++ b/src/develop/imageop_math.c
+@@ -18,14 +18,8 @@
+ 
+ #include "develop/imageop_math.h"
+ #include <assert.h> // for assert
+-#ifdef __SSE__
+-#include <emmintrin.h> // for _mm_set_epi32, _mm_add_epi32
+-#endif
+ #include <glib.h> // for MIN, MAX, CLAMP, inline
+ #include <math.h> // for round, floorf, fmaxf
+-#ifdef __SSE__
+-#include <xmmintrin.h> // for _mm_set_ps, _mm_mul_ps, _mm_set...
+-#endif
+ #include "common/darktable.h"        // for darktable, darktable_t, dt_code...
+ #include "common/imageio.h"          // for FILTERS_ARE_4BAYER
+ #include "common/interpolation.h"    // for dt_interpolation_new, dt_interp...
+@@ -177,7 +171,7 @@ int dt_iop_clip_and_zoom_roi_cl(int devid, cl_mem dev_out, cl_mem dev_in, const
+ 
+ #endif
+ 
+-void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint16_t *const in,
++void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in,
+                                                  const dt_iop_roi_t *const roi_out,
+                                                  const dt_iop_roi_t *const roi_in, const int32_t out_stride,
+                                                  const int32_t in_stride, const uint32_t filters)
+@@ -244,224 +238,12 @@ void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint
+             num++;
+           }
+         }
+-      *outc = col / num;
+-    }
+-  }
+-}
+-
+-#if defined(__SSE__)
+-void dt_iop_clip_and_zoom_mosaic_half_size_sse2(uint16_t *const out, const uint16_t *const in,
+-                                                const dt_iop_roi_t *const roi_out,
+-                                                const dt_iop_roi_t *const roi_in, const int32_t out_stride,
+-                                                const int32_t in_stride, const uint32_t filters)
+-{
+-  // adjust to pixel region and don't sample more than scale/2 nbs!
+-  // pixel footprint on input buffer, radius:
+-  const float px_footprint = 1.f / roi_out->scale;
+-  // how many 2x2 blocks can be sampled inside that area
+-  const int samples = round(px_footprint / 2);
+-
+-  // move p to point to an rggb block:
+-  int trggbx = 0, trggby = 0;
+-  if(FC(trggby, trggbx + 1, filters) != 1) trggbx++;
+-  if(FC(trggby, trggbx, filters) != 0)
+-  {
+-    trggbx = (trggbx + 1) & 1;
+-    trggby++;
+-  }
+-  const int rggbx = trggbx, rggby = trggby;
+-
+-#ifdef _OPENMP
+-#pragma omp parallel for default(none) \
+-  dt_omp_firstprivate(in, in_stride, out, out_stride, px_footprint, rggbx, rggby, roi_in, roi_out, samples) \
+-  schedule(static)
+-#endif
+-  for(int y = 0; y < roi_out->height; y++)
+-  {
+-    uint16_t *outc = out + out_stride * y;
+-
+-    const float fy = (y + roi_out->y) * px_footprint;
+-    int py = (int)fy & ~1;
+-    const float dy = (fy - py) / 2;
+-    py = MIN(((roi_in->height - 6) & ~1u), py) + rggby;
+-
+-    const int maxj = MIN(((roi_in->height - 5) & ~1u) + rggby, py + 2 * samples);
+-
+-    for(int x = 0; x < roi_out->width; x++)
+-    {
+-      __m128 col = _mm_setzero_ps();
+-
+-      const float fx = (x + roi_out->x) * px_footprint;
+-      int px = (int)fx & ~1;
+-      const float dx = (fx - px) / 2;
+-      px = MIN(((roi_in->width - 6) & ~1u), px) + rggbx;
+-
+-      const int maxi = MIN(((roi_in->width - 5) & ~1u) + rggbx, px + 2 * samples);
+-
+-      float p1, p2, p3, p4;
+-      float num = 0;
+-
+-      // upper left 2x2 block of sampling region
+-      p1 = in[px + in_stride * py];
+-      p2 = in[px + 1 + in_stride * py];
+-      p3 = in[px + in_stride * (py + 1)];
+-      p4 = in[px + 1 + in_stride * (py + 1)];
+-      col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * (1 - dy)), _mm_set_ps(p4, p3, p2, p1)));
+-
+-      // left 2x2 block border of sampling region
+-      for(int j = py + 2; j <= maxj; j += 2)
+-      {
+-        p1 = in[px + in_stride * j];
+-        p2 = in[px + 1 + in_stride * j];
+-        p3 = in[px + in_stride * (j + 1)];
+-        p4 = in[px + 1 + in_stride * (j + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dx), _mm_set_ps(p4, p3, p2, p1)));
+-      }
+-
+-      // upper 2x2 block border of sampling region
+-      for(int i = px + 2; i <= maxi; i += 2)
+-      {
+-        p1 = in[i + in_stride * py];
+-        p2 = in[i + 1 + in_stride * py];
+-        p3 = in[i + in_stride * (py + 1)];
+-        p4 = in[i + 1 + in_stride * (py + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dy), _mm_set_ps(p4, p3, p2, p1)));
+-      }
+-
+-      // 2x2 blocks in the middle of sampling region
+-      for(int j = py + 2; j <= maxj; j += 2)
+-        for(int i = px + 2; i <= maxi; i += 2)
+-        {
+-          p1 = in[i + in_stride * j];
+-          p2 = in[i + 1 + in_stride * j];
+-          p3 = in[i + in_stride * (j + 1)];
+-          p4 = in[i + 1 + in_stride * (j + 1)];
+-          col = _mm_add_ps(col, _mm_set_ps(p4, p3, p2, p1));
+-        }
+-
+-      if(maxi == px + 2 * samples && maxj == py + 2 * samples)
+-      {
+-        // right border
+-        for(int j = py + 2; j <= maxj; j += 2)
+-        {
+-          p1 = in[maxi + 2 + in_stride * j];
+-          p2 = in[maxi + 3 + in_stride * j];
+-          p3 = in[maxi + 2 + in_stride * (j + 1)];
+-          p4 = in[maxi + 3 + in_stride * (j + 1)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(p4, p3, p2, p1)));
+-        }
+-
+-        // upper right
+-        p1 = in[maxi + 2 + in_stride * py];
+-        p2 = in[maxi + 3 + in_stride * py];
+-        p3 = in[maxi + 2 + in_stride * (py + 1)];
+-        p4 = in[maxi + 3 + in_stride * (py + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        // lower border
+-        for(int i = px + 2; i <= maxi; i += 2)
+-        {
+-          p1 = in[i + in_stride * (maxj + 2)];
+-          p2 = in[i + 1 + in_stride * (maxj + 2)];
+-          p3 = in[i + in_stride * (maxj + 3)];
+-          p4 = in[i + 1 + in_stride * (maxj + 3)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(p4, p3, p2, p1)));
+-        }
+-
+-        // lower left 2x2 block
+-        p1 = in[px + in_stride * (maxj + 2)];
+-        p2 = in[px + 1 + in_stride * (maxj + 2)];
+-        p3 = in[px + in_stride * (maxj + 3)];
+-        p4 = in[px + 1 + in_stride * (maxj + 3)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        // lower right 2x2 block
+-        p1 = in[maxi + 2 + in_stride * (maxj + 2)];
+-        p2 = in[maxi + 3 + in_stride * (maxj + 2)];
+-        p3 = in[maxi + 2 + in_stride * (maxj + 3)];
+-        p4 = in[maxi + 3 + in_stride * (maxj + 3)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * dy), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        num = (samples + 1) * (samples + 1);
+-      }
+-      else if(maxi == px + 2 * samples)
+-      {
+-        // right border
+-        for(int j = py + 2; j <= maxj; j += 2)
+-        {
+-          p1 = in[maxi + 2 + in_stride * j];
+-          p2 = in[maxi + 3 + in_stride * j];
+-          p3 = in[maxi + 2 + in_stride * (j + 1)];
+-          p4 = in[maxi + 3 + in_stride * (j + 1)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(p4, p3, p2, p1)));
+-        }
+-
+-        // upper right
+-        p1 = in[maxi + 2 + in_stride * py];
+-        p2 = in[maxi + 3 + in_stride * py];
+-        p3 = in[maxi + 2 + in_stride * (py + 1)];
+-        p4 = in[maxi + 3 + in_stride * (py + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        num = ((maxj - py) / 2 + 1 - dy) * (samples + 1);
+-      }
+-      else if(maxj == py + 2 * samples)
+-      {
+-        // lower border
+-        for(int i = px + 2; i <= maxi; i += 2)
+-        {
+-          p1 = in[i + in_stride * (maxj + 2)];
+-          p2 = in[i + 1 + in_stride * (maxj + 2)];
+-          p3 = in[i + in_stride * (maxj + 3)];
+-          p4 = in[i + 1 + in_stride * (maxj + 3)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(p4, p3, p2, p1)));
+-        }
+-
+-        // lower left 2x2 block
+-        p1 = in[px + in_stride * (maxj + 2)];
+-        p2 = in[px + 1 + in_stride * (maxj + 2)];
+-        p3 = in[px + in_stride * (maxj + 3)];
+-        p4 = in[px + 1 + in_stride * (maxj + 3)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        num = ((maxi - px) / 2 + 1 - dx) * (samples + 1);
+-      }
+-      else
+-      {
+-        num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
+-      }
+-
+-      num = 1.0f / num;
+-      col = _mm_mul_ps(col, _mm_set1_ps(num));
+-
+-      float fcol[4] __attribute__((aligned(64)));
+-      _mm_store_ps(fcol, col);
+-
+-      const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2));
+-      *outc = (uint16_t)(fcol[c]);
+-      outc++;
++      if(num) *outc = col / num;
+     }
+   }
+-  _mm_sfence();
+-}
+-#endif
+-
+-void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in,
+-                                           const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
+-                                           const int32_t out_stride, const int32_t in_stride,
+-                                           const uint32_t filters)
+-{
+-  if(1)//(darktable.codepath.OPENMP_SIMD)
+-    return dt_iop_clip_and_zoom_mosaic_half_size_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters);
+-#if defined(__SSE__)
+-  else if(darktable.codepath.SSE2)
+-    return dt_iop_clip_and_zoom_mosaic_half_size_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
+-#endif
+-  else
+-    dt_unreachable_codepath();
+ }
+ 
+-void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float *const in,
++void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in,
+                                                    const dt_iop_roi_t *const roi_out,
+                                                    const dt_iop_roi_t *const roi_in, const int32_t out_stride,
+                                                    const int32_t in_stride, const uint32_t filters)
+@@ -643,223 +425,10 @@ void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float
+       }
+ 
+       const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2));
+-      *outc = col[c] / num;
+-      outc++;
+-    }
+-  }
+-}
+-
+-#if defined(__SSE__)
+-void dt_iop_clip_and_zoom_mosaic_half_size_f_sse2(float *const out, const float *const in,
+-                                                  const dt_iop_roi_t *const roi_out,
+-                                                  const dt_iop_roi_t *const roi_in, const int32_t out_stride,
+-                                                  const int32_t in_stride, const uint32_t filters)
+-{
+-  // adjust to pixel region and don't sample more than scale/2 nbs!
+-  // pixel footprint on input buffer, radius:
+-  const float px_footprint = 1.f / roi_out->scale;
+-  // how many 2x2 blocks can be sampled inside that area
+-  const int samples = round(px_footprint / 2);
+-
+-  // move p to point to an rggb block:
+-  int trggbx = 0, trggby = 0;
+-  if(FC(trggby, trggbx + 1, filters) != 1) trggbx++;
+-  if(FC(trggby, trggbx, filters) != 0)
+-  {
+-    trggbx = (trggbx + 1) & 1;
+-    trggby++;
+-  }
+-  const int rggbx = trggbx, rggby = trggby;
+-
+-#ifdef _OPENMP
+-#pragma omp parallel for default(none) \
+-  dt_omp_firstprivate(in, in_stride, out, out_stride, px_footprint, rggbx, \
+-                      rggby, roi_in, roi_out, samples) \
+-  schedule(static)
+-#endif
+-  for(int y = 0; y < roi_out->height; y++)
+-  {
+-    float *outc = out + out_stride * y;
+-
+-    const float fy = (y + roi_out->y) * px_footprint;
+-    int py = (int)fy & ~1;
+-    const float dy = (fy - py) / 2;
+-    py = MIN(((roi_in->height - 6) & ~1u), py) + rggby;
+-
+-    const int maxj = MIN(((roi_in->height - 5) & ~1u) + rggby, py + 2 * samples);
+-
+-    for(int x = 0; x < roi_out->width; x++)
+-    {
+-      __m128 col = _mm_setzero_ps();
+-
+-      const float fx = (x + roi_out->x) * px_footprint;
+-      int px = (int)fx & ~1;
+-      const float dx = (fx - px) / 2;
+-      px = MIN(((roi_in->width - 6) & ~1u), px) + rggbx;
+-
+-      const int maxi = MIN(((roi_in->width - 5) & ~1u) + rggbx, px + 2 * samples);
+-
+-      float p1, p2, p3, p4;
+-      float num = 0;
+-
+-      // upper left 2x2 block of sampling region
+-      p1 = in[px + in_stride * py];
+-      p2 = in[px + 1 + in_stride * py];
+-      p3 = in[px + in_stride * (py + 1)];
+-      p4 = in[px + 1 + in_stride * (py + 1)];
+-      col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * (1 - dy)), _mm_set_ps(p4, p3, p2, p1)));
+-
+-      // left 2x2 block border of sampling region
+-      for(int j = py + 2; j <= maxj; j += 2)
+-      {
+-        p1 = in[px + in_stride * j];
+-        p2 = in[px + 1 + in_stride * j];
+-        p3 = in[px + in_stride * (j + 1)];
+-        p4 = in[px + 1 + in_stride * (j + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dx), _mm_set_ps(p4, p3, p2, p1)));
+-      }
+-
+-      // upper 2x2 block border of sampling region
+-      for(int i = px + 2; i <= maxi; i += 2)
+-      {
+-        p1 = in[i + in_stride * py];
+-        p2 = in[i + 1 + in_stride * py];
+-        p3 = in[i + in_stride * (py + 1)];
+-        p4 = in[i + 1 + in_stride * (py + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dy), _mm_set_ps(p4, p3, p2, p1)));
+-      }
+-
+-      // 2x2 blocks in the middle of sampling region
+-      for(int j = py + 2; j <= maxj; j += 2)
+-        for(int i = px + 2; i <= maxi; i += 2)
+-        {
+-          p1 = in[i + in_stride * j];
+-          p2 = in[i + 1 + in_stride * j];
+-          p3 = in[i + in_stride * (j + 1)];
+-          p4 = in[i + 1 + in_stride * (j + 1)];
+-          col = _mm_add_ps(col, _mm_set_ps(p4, p3, p2, p1));
+-        }
+-
+-      if(maxi == px + 2 * samples && maxj == py + 2 * samples)
+-      {
+-        // right border
+-        for(int j = py + 2; j <= maxj; j += 2)
+-        {
+-          p1 = in[maxi + 2 + in_stride * j];
+-          p2 = in[maxi + 3 + in_stride * j];
+-          p3 = in[maxi + 2 + in_stride * (j + 1)];
+-          p4 = in[maxi + 3 + in_stride * (j + 1)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(p4, p3, p2, p1)));
+-        }
+-
+-        // upper right
+-        p1 = in[maxi + 2 + in_stride * py];
+-        p2 = in[maxi + 3 + in_stride * py];
+-        p3 = in[maxi + 2 + in_stride * (py + 1)];
+-        p4 = in[maxi + 3 + in_stride * (py + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        // lower border
+-        for(int i = px + 2; i <= maxi; i += 2)
+-        {
+-          p1 = in[i + in_stride * (maxj + 2)];
+-          p2 = in[i + 1 + in_stride * (maxj + 2)];
+-          p3 = in[i + in_stride * (maxj + 3)];
+-          p4 = in[i + 1 + in_stride * (maxj + 3)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(p4, p3, p2, p1)));
+-        }
+-
+-        // lower left 2x2 block
+-        p1 = in[px + in_stride * (maxj + 2)];
+-        p2 = in[px + 1 + in_stride * (maxj + 2)];
+-        p3 = in[px + in_stride * (maxj + 3)];
+-        p4 = in[px + 1 + in_stride * (maxj + 3)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        // lower right 2x2 block
+-        p1 = in[maxi + 2 + in_stride * (maxj + 2)];
+-        p2 = in[maxi + 3 + in_stride * (maxj + 2)];
+-        p3 = in[maxi + 2 + in_stride * (maxj + 3)];
+-        p4 = in[maxi + 3 + in_stride * (maxj + 3)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * dy), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        num = (samples + 1) * (samples + 1);
+-      }
+-      else if(maxi == px + 2 * samples)
+-      {
+-        // right border
+-        for(int j = py + 2; j <= maxj; j += 2)
+-        {
+-          p1 = in[maxi + 2 + in_stride * j];
+-          p2 = in[maxi + 3 + in_stride * j];
+-          p3 = in[maxi + 2 + in_stride * (j + 1)];
+-          p4 = in[maxi + 3 + in_stride * (j + 1)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(p4, p3, p2, p1)));
+-        }
+-
+-        // upper right
+-        p1 = in[maxi + 2 + in_stride * py];
+-        p2 = in[maxi + 3 + in_stride * py];
+-        p3 = in[maxi + 2 + in_stride * (py + 1)];
+-        p4 = in[maxi + 3 + in_stride * (py + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        num = ((maxj - py) / 2 + 1 - dy) * (samples + 1);
+-      }
+-      else if(maxj == py + 2 * samples)
+-      {
+-        // lower border
+-        for(int i = px + 2; i <= maxi; i += 2)
+-        {
+-          p1 = in[i + in_stride * (maxj + 2)];
+-          p2 = in[i + 1 + in_stride * (maxj + 2)];
+-          p3 = in[i + in_stride * (maxj + 3)];
+-          p4 = in[i + 1 + in_stride * (maxj + 3)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(p4, p3, p2, p1)));
+-        }
+-
+-        // lower left 2x2 block
+-        p1 = in[px + in_stride * (maxj + 2)];
+-        p2 = in[px + 1 + in_stride * (maxj + 2)];
+-        p3 = in[px + in_stride * (maxj + 3)];
+-        p4 = in[px + 1 + in_stride * (maxj + 3)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(p4, p3, p2, p1)));
+-
+-        num = ((maxi - px) / 2 + 1 - dx) * (samples + 1);
+-      }
+-      else
+-      {
+-        num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
+-      }
+-
+-      num = 1.0f / num;
+-      col = _mm_mul_ps(col, _mm_set1_ps(num));
+-
+-      float fcol[4] __attribute__((aligned(64)));
+-      _mm_store_ps(fcol, col);
+-
+-      const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2));
+-      *outc = fcol[c];
++      if(num) *outc = col[c] / num;
+       outc++;
+     }
+   }
+-  _mm_sfence();
+-}
+-#endif
+-
+-void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in,
+-                                             const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
+-                                             const int32_t out_stride, const int32_t in_stride,
+-                                             const uint32_t filters)
+-{
+-  if(darktable.codepath.OPENMP_SIMD)
+-    return dt_iop_clip_and_zoom_mosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters);
+-#if defined(__SSE__)
+-  else if(darktable.codepath.SSE2)
+-    return dt_iop_clip_and_zoom_mosaic_half_size_f_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
+-#endif
+-  else
+-    dt_unreachable_codepath();
+ }
+ 
+ /**
+@@ -951,7 +520,7 @@ void dt_iop_clip_and_zoom_mosaic_third_size_xtrans_f(float *const out, const flo
+   }
+ }
+ 
+-void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, const float *const in,
++void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in,
+                                                                   const dt_iop_roi_t *const roi_out,
+                                                                   const dt_iop_roi_t *const roi_in,
+                                                                   const int32_t out_stride,
+@@ -1085,7 +654,7 @@ void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co
+         num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
+       }
+ 
+-      const float pix = col / num;
++      const float pix = (num) ? col / num : 0.0f;
+       outc[0] = pix;
+       outc[1] = pix;
+       outc[2] = pix;
+@@ -1095,256 +664,7 @@ void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co
+   }
+ }
+ 
+-#if defined(__SSE__)
+-void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_sse2(float *out, const float *const in,
+-                                                                 const dt_iop_roi_t *const roi_out,
+-                                                                 const dt_iop_roi_t *const roi_in,
+-                                                                 const int32_t out_stride,
+-                                                                 const int32_t in_stride)
+-{
+-  // adjust to pixel region and don't sample more than scale/2 nbs!
+-  // pixel footprint on input buffer, radius:
+-  const float px_footprint = 1.f / roi_out->scale;
+-  // how many pixels can be sampled inside that area
+-  const int samples = round(px_footprint);
+-
+-#ifdef _OPENMP
+-#pragma omp parallel for default(none) \
+-  dt_omp_firstprivate(in, in_stride, out_stride, px_footprint, roi_in, roi_out, samples) \
+-  shared(out) \
+-  schedule(static)
+-#endif
+-  for(int y = 0; y < roi_out->height; y++)
+-  {
+-    float *outc = out + 4 * (out_stride * y);
+-
+-    const float fy = (y + roi_out->y) * px_footprint;
+-    int py = (int)fy;
+-    const float dy = fy - py;
+-    py = MIN(((roi_in->height - 3)), py);
+-
+-    const int maxj = MIN(((roi_in->height - 2)), py + samples);
+-
+-    for(int x = 0; x < roi_out->width; x++)
+-    {
+-      __m128 col = _mm_setzero_ps();
+-
+-      const float fx = (x + roi_out->x) * px_footprint;
+-      int px = (int)fx;
+-      const float dx = fx - px;
+-      px = MIN(((roi_in->width - 3)), px);
+-
+-      const int maxi = MIN(((roi_in->width - 2)), px + samples);
+-
+-      float p;
+-      float num = 0;
+-
+-      // upper left pixel of sampling region
+-      p = in[px + in_stride * py];
+-      col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * (1 - dy)), _mm_set_ps(0.0f, p, p, p)));
+-
+-      // left pixel border of sampling region
+-      for(int j = py + 1; j <= maxj; j++)
+-      {
+-        p = in[px + in_stride * j];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dx), _mm_set_ps(0.0f, p, p, p)));
+-      }
+-
+-      // upper pixel border of sampling region
+-      for(int i = px + 1; i <= maxi; i++)
+-      {
+-        p = in[i + in_stride * py];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dy), _mm_set_ps(0.0f, p, p, p)));
+-      }
+-
+-      // pixels in the middle of sampling region
+-      for(int j = py + 1; j <= maxj; j++)
+-        for(int i = px + 1; i <= maxi; i++)
+-        {
+-          p = in[i + in_stride * j];
+-          col = _mm_add_ps(col, _mm_set_ps(0.0f, p, p, p));
+-        }
+-
+-      if(maxi == px + samples && maxj == py + samples)
+-      {
+-        // right border
+-        for(int j = py + 1; j <= maxj; j++)
+-        {
+-          p = in[maxi + 1 + in_stride * j];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(0.0f, p, p, p)));
+-        }
+-
+-        // upper right
+-        p = in[maxi + 1 + in_stride * py];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(0.0f, p, p, p)));
+-
+-        // lower border
+-        for(int i = px + 1; i <= maxi; i++)
+-        {
+-          p = in[i + in_stride * (maxj + 1)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(0.0f, p, p, p)));
+-        }
+-
+-        // lower left pixel
+-        p = in[px + in_stride * (maxj + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(0.0f, p, p, p)));
+-
+-        // lower right pixel
+-        p = in[maxi + 1 + in_stride * (maxj + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * dy), _mm_set_ps(0.0f, p, p, p)));
+-
+-        num = (samples + 1) * (samples + 1);
+-      }
+-      else if(maxi == px + samples)
+-      {
+-        // right border
+-        for(int j = py + 1; j <= maxj; j++)
+-        {
+-          p = in[maxi + 1 + in_stride * j];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(0.0f, p, p, p)));
+-        }
+-
+-        // upper right
+-        p = in[maxi + 1 + in_stride * py];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(0.0f, p, p, p)));
+-
+-        num = ((maxj - py) / 2 + 1 - dy) * (samples + 1);
+-      }
+-      else if(maxj == py + samples)
+-      {
+-        // lower border
+-        for(int i = px + 1; i <= maxi; i++)
+-        {
+-          p = in[i + in_stride * (maxj + 1)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(0.0f, p, p, p)));
+-        }
+-
+-        // lower left pixel
+-        p = in[px + in_stride * (maxj + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(0.0f, p, p, p)));
+-
+-        num = ((maxi - px) / 2 + 1 - dx) * (samples + 1);
+-      }
+-      else
+-      {
+-        num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
+-      }
+-
+-      num = 1.0f / num;
+-      col = _mm_mul_ps(col, _mm_set_ps(0.0f, num, num, num));
+-      _mm_stream_ps(outc, col);
+-      outc += 4;
+-    }
+-  }
+-  _mm_sfence();
+-}
+-#endif
+-
+-void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in,
+-                                                            const dt_iop_roi_t *const roi_out,
+-                                                            const dt_iop_roi_t *const roi_in,
+-                                                            const int32_t out_stride, const int32_t in_stride)
+-{
+-  if(darktable.codepath.OPENMP_SIMD)
+-    return dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(out, in, roi_out, roi_in, out_stride,
+-                                                                        in_stride);
+-#if defined(__SSE__)
+-  else if(darktable.codepath.SSE2)
+-    return dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_sse2(out, in, roi_out, roi_in, out_stride,
+-                                                                       in_stride);
+-#endif
+-  else
+-    dt_unreachable_codepath();
+-}
+-
+-#if 0 // gets rid of pink artifacts, but doesn't do sub-pixel sampling, so shows some staircasing artifacts.
+-void
+-dt_iop_clip_and_zoom_demosaic_half_size_f(
+-  float *out,
+-  const float *const in,
+-  const dt_iop_roi_t *const roi_out,
+-  const dt_iop_roi_t *const roi_in,
+-  const int32_t out_stride,
+-  const int32_t in_stride,
+-  const uint32_t filters,
+-  const float clip)
+-{
+-  // adjust to pixel region and don't sample more than scale/2 nbs!
+-  // pixel footprint on input buffer, radius:
+-  const float px_footprint = 1.f/roi_out->scale;
+-  // how many 2x2 blocks can be sampled inside that area
+-  const int samples = round(px_footprint/2);
+-
+-  // move p to point to an rggb block:
+-  int trggbx = 0, trggby = 0;
+-  if(FC(trggby, trggbx+1, filters) != 1) trggbx ++;
+-  if(FC(trggby, trggbx,   filters) != 0)
+-  {
+-    trggbx = (trggbx + 1)&1;
+-    trggby ++;
+-  }
+-  const int rggbx = trggbx, rggby = trggby;
+-
+-#ifdef _OPENMP
+-#pragma omp parallel for default(none) shared(out) schedule(static)
+-#endif
+-  for(int y=0; y<roi_out->height; y++)
+-  {
+-    float *outc = out + 4*(out_stride*y);
+-
+-    const float fy = (y + roi_out->y)*px_footprint;
+-    int py = (int)fy & ~1;
+-    py = MIN(((roi_in->height-4) & ~1u), py) + rggby;
+-
+-    int maxj = MIN(((roi_in->height-3)&~1u)+rggby, py+2*samples);
+-
+-    const float fx = roi_out->x*px_footprint;
+-
+-    for(int x=0; x<roi_out->width; x++)
+-    {
+-      __m128 col = _mm_setzero_ps();
+-
+-      fx += px_footprint;
+-      int px = (int)fx & ~1;
+-      px = MIN(((roi_in->width -4) & ~1u), px) + rggbx;
+-
+-      const int maxi = MIN(((roi_in->width -3)&~1u)+rggbx, px+2*samples);
+-
+-      int num = 0;
+-
+-      const int idx = px + in_stride*py;
+-      const float pc = MAX(MAX(in[idx], in[idx+1]), MAX(in[idx + in_stride], in[idx+1 + in_stride]));
+-
+-      // 2x2 blocks in the middle of sampling region
+-      __m128 sum = _mm_setzero_ps();
+-
+-      for(int j=py; j<=maxj; j+=2)
+-        for(int i=px; i<=maxi; i+=2)
+-        {
+-          const float p1 = in[i   + in_stride*j];
+-          const float p2 = in[i+1 + in_stride*j];
+-          const float p3 = in[i   + in_stride*(j + 1)];
+-          const float p4 = in[i+1 + in_stride*(j + 1)];
+-
+-          if (!((pc >= clip) ^ (MAX(MAX(p1,p2),MAX(p3,p4)) >= clip)))
+-          {
+-            sum = _mm_add_ps(sum, _mm_set_ps(0,p4,p3+p2,p1));
+-            num++;
+-          }
+-        }
+-
+-      col = _mm_mul_ps(sum, _mm_div_ps(_mm_set_ps(0.0f,1.0f,0.5f,1.0f),_mm_set1_ps(num)));
+-      _mm_stream_ps(outc, col);
+-      outc += 4;
+-    }
+-  }
+-  _mm_sfence();
+-}
+-
+-#else
+-// very fast and smooth, but doesn't handle highlights:
+-
+-void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *const in,
++void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in,
+                                                      const dt_iop_roi_t *const roi_out,
+                                                      const dt_iop_roi_t *const roi_in, const int32_t out_stride,
+                                                      const int32_t in_stride, const uint32_t filters)
+@@ -1522,202 +842,6 @@ void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *co
+   }
+ }
+ 
+-#if defined(__SSE__)
+-void dt_iop_clip_and_zoom_demosaic_half_size_f_sse2(float *out, const float *const in,
+-                                                    const dt_iop_roi_t *const roi_out,
+-                                                    const dt_iop_roi_t *const roi_in, const int32_t out_stride,
+-                                                    const int32_t in_stride, const uint32_t filters)
+-{
+-  // adjust to pixel region and don't sample more than scale/2 nbs!
+-  // pixel footprint on input buffer, radius:
+-  const float px_footprint = 1.f / roi_out->scale;
+-  // how many 2x2 blocks can be sampled inside that area
+-  const int samples = round(px_footprint / 2);
+-
+-  // move p to point to an rggb block:
+-  int trggbx = 0, trggby = 0;
+-  if(FC(trggby, trggbx + 1, filters) != 1) trggbx++;
+-  if(FC(trggby, trggbx, filters) != 0)
+-  {
+-    trggbx = (trggbx + 1) & 1;
+-    trggby++;
+-  }
+-  const int rggbx = trggbx, rggby = trggby;
+-
+-#ifdef _OPENMP
+-#pragma omp parallel for default(none) \
+-  dt_omp_firstprivate(in, in_stride, px_footprint, rggbx, rggby, out_stride, roi_in, roi_out, samples) \
+-  shared(out) \
+-  schedule(static)
+-#endif
+-  for(int y = 0; y < roi_out->height; y++)
+-  {
+-    float *outc = out + 4 * (out_stride * y);
+-
+-    const float fy = (y + roi_out->y) * px_footprint;
+-    int py = (int)fy & ~1;
+-    const float dy = (fy - py) / 2;
+-    py = MIN(((roi_in->height - 6) & ~1u), py) + rggby;
+-
+-    const int maxj = MIN(((roi_in->height - 5) & ~1u) + rggby, py + 2 * samples);
+-
+-    for(int x = 0; x < roi_out->width; x++)
+-    {
+-      __m128 col = _mm_setzero_ps();
+-
+-      const float fx = (x + roi_out->x) * px_footprint;
+-      int px = (int)fx & ~1;
+-      const float dx = (fx - px) / 2;
+-      px = MIN(((roi_in->width - 6) & ~1u), px) + rggbx;
+-
+-      const int maxi = MIN(((roi_in->width - 5) & ~1u) + rggbx, px + 2 * samples);
+-
+-      float p1, p2, p4;
+-      float num = 0;
+-
+-      // upper left 2x2 block of sampling region
+-      p1 = in[px + in_stride * py];
+-      p2 = in[px + 1 + in_stride * py] + in[px + in_stride * (py + 1)];
+-      p4 = in[px + 1 + in_stride * (py + 1)];
+-      col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * (1 - dy)), _mm_set_ps(0.0f, p4, p2, p1)));
+-
+-      // left 2x2 block border of sampling region
+-      for(int j = py + 2; j <= maxj; j += 2)
+-      {
+-        p1 = in[px + in_stride * j];
+-        p2 = in[px + 1 + in_stride * j] + in[px + in_stride * (j + 1)];
+-        p4 = in[px + 1 + in_stride * (j + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dx), _mm_set_ps(0.0f, p4, p2, p1)));
+-      }
+-
+-      // upper 2x2 block border of sampling region
+-      for(int i = px + 2; i <= maxi; i += 2)
+-      {
+-        p1 = in[i + in_stride * py];
+-        p2 = in[i + 1 + in_stride * py] + in[i + in_stride * (py + 1)];
+-        p4 = in[i + 1 + in_stride * (py + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dy), _mm_set_ps(0.0f, p4, p2, p1)));
+-      }
+-
+-      // 2x2 blocks in the middle of sampling region
+-      for(int j = py + 2; j <= maxj; j += 2)
+-        for(int i = px + 2; i <= maxi; i += 2)
+-        {
+-          p1 = in[i + in_stride * j];
+-          p2 = in[i + 1 + in_stride * j] + in[i + in_stride * (j + 1)];
+-          p4 = in[i + 1 + in_stride * (j + 1)];
+-          col = _mm_add_ps(col, _mm_set_ps(0.0f, p4, p2, p1));
+-        }
+-
+-      if(maxi == px + 2 * samples && maxj == py + 2 * samples)
+-      {
+-        // right border
+-        for(int j = py + 2; j <= maxj; j += 2)
+-        {
+-          p1 = in[maxi + 2 + in_stride * j];
+-          p2 = in[maxi + 3 + in_stride * j] + in[maxi + 2 + in_stride * (j + 1)];
+-          p4 = in[maxi + 3 + in_stride * (j + 1)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(0.0f, p4, p2, p1)));
+-        }
+-
+-        // upper right
+-        p1 = in[maxi + 2 + in_stride * py];
+-        p2 = in[maxi + 3 + in_stride * py] + in[maxi + 2 + in_stride * (py + 1)];
+-        p4 = in[maxi + 3 + in_stride * (py + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(0.0f, p4, p2, p1)));
+-
+-        // lower border
+-        for(int i = px + 2; i <= maxi; i += 2)
+-        {
+-          p1 = in[i + in_stride * (maxj + 2)];
+-          p2 = in[i + 1 + in_stride * (maxj + 2)] + in[i + in_stride * (maxj + 3)];
+-          p4 = in[i + 1 + in_stride * (maxj + 3)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(0.0f, p4, p2, p1)));
+-        }
+-
+-        // lower left 2x2 block
+-        p1 = in[px + in_stride * (maxj + 2)];
+-        p2 = in[px + 1 + in_stride * (maxj + 2)] + in[px + in_stride * (maxj + 3)];
+-        p4 = in[px + 1 + in_stride * (maxj + 3)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(0.0f, p4, p2, p1)));
+-
+-        // lower right 2x2 block
+-        p1 = in[maxi + 2 + in_stride * (maxj + 2)];
+-        p2 = in[maxi + 3 + in_stride * (maxj + 2)] + in[maxi + 2 + in_stride * (maxj + 3)];
+-        p4 = in[maxi + 3 + in_stride * (maxj + 3)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * dy), _mm_set_ps(0.0f, p4, p2, p1)));
+-
+-        num = (samples + 1) * (samples + 1);
+-      }
+-      else if(maxi == px + 2 * samples)
+-      {
+-        // right border
+-        for(int j = py + 2; j <= maxj; j += 2)
+-        {
+-          p1 = in[maxi + 2 + in_stride * j];
+-          p2 = in[maxi + 3 + in_stride * j] + in[maxi + 2 + in_stride * (j + 1)];
+-          p4 = in[maxi + 3 + in_stride * (j + 1)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(0.0f, p4, p2, p1)));
+-        }
+-
+-        // upper right
+-        p1 = in[maxi + 2 + in_stride * py];
+-        p2 = in[maxi + 3 + in_stride * py] + in[maxi + 2 + in_stride * (py + 1)];
+-        p4 = in[maxi + 3 + in_stride * (py + 1)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(0.0f, p4, p2, p1)));
+-
+-        num = ((maxj - py) / 2 + 1 - dy) * (samples + 1);
+-      }
+-      else if(maxj == py + 2 * samples)
+-      {
+-        // lower border
+-        for(int i = px + 2; i <= maxi; i += 2)
+-        {
+-          p1 = in[i + in_stride * (maxj + 2)];
+-          p2 = in[i + 1 + in_stride * (maxj + 2)] + in[i + in_stride * (maxj + 3)];
+-          p4 = in[i + 1 + in_stride * (maxj + 3)];
+-          col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(0.0f, p4, p2, p1)));
+-        }
+-
+-        // lower left 2x2 block
+-        p1 = in[px + in_stride * (maxj + 2)];
+-        p2 = in[px + 1 + in_stride * (maxj + 2)] + in[px + in_stride * (maxj + 3)];
+-        p4 = in[px + 1 + in_stride * (maxj + 3)];
+-        col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(0.0f, p4, p2, p1)));
+-
+-        num = ((maxi - px) / 2 + 1 - dx) * (samples + 1);
+-      }
+-      else
+-      {
+-        num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
+-      }
+-
+-      num = 1.0f / num;
+-      col = _mm_mul_ps(col, _mm_set_ps(0.0f, num, 0.5f * num, num));
+-      _mm_stream_ps(outc, col);
+-      outc += 4;
+-    }
+-  }
+-  _mm_sfence();
+-}
+-#endif
+-#endif
+-
+-void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in,
+-                                               const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
+-                                               const int32_t out_stride, const int32_t in_stride,
+-                                               const uint32_t filters)
+-{
+-  if(darktable.codepath.OPENMP_SIMD)
+-    return dt_iop_clip_and_zoom_demosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride,
+-                                                           filters);
+-#if defined(__SSE__)
+-  else if(darktable.codepath.SSE2)
+-    return dt_iop_clip_and_zoom_demosaic_half_size_f_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
+-#endif
+-  else
+-    dt_unreachable_codepath();
+-}
+ 
+ void dt_iop_clip_and_zoom_demosaic_third_size_xtrans_f(float *out, const float *const in,
+                                                        const dt_iop_roi_t *const roi_out,
diff -Nru darktable-3.4.1/debian/patches/series darktable-3.4.1/debian/patches/series
--- darktable-3.4.1/debian/patches/series	2021-05-20 14:07:16.000000000 -0300
+++ darktable-3.4.1/debian/patches/series	2021-06-05 12:41:39.000000000 -0300
@@ -1 +1,2 @@
 0001-add-explicit-dependency-on-generate_conf.patch
+0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch
commit f007e678d47f5662326824725cae2ab9e2455e66
Author: Hanno Schwalm <hanno@schwalm-bremen.de>
Date:   Fri May 14 18:20:37 2021 +0200

    Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size (#8954)
    
    * Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size_plain
    
    Fixes #8951
    
    Although the file given in the issue is crippled we can avoid the crash.
    In `dt_iop_clip_and_zoom_mosaic_half_size` and the sse friend there is possibly a div/0
    problem that should be checked.
    
    * Fixing same dib by zero in dt_iop_clip_and_zoom_mosaic_half_size_f
    
    * Remove sse code for dt_iop_clip_and_zoom_mosaic... after testing performance
    
    checked performance non-sse vs sse specific code
    - with added local timers
    - using gcc 10.2
    - testing -t 1/4/8/16
    - intel (xeon like 9900) with fixed clock rate
    
    in
    - dt_iop_clip_and_zoom_mosaic_half_size
    - dt_iop_clip_and_zoom_mosaic_half_size_f
    - dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f
    - dt_iop_clip_and_zoom_demosaic_half_size_f
    
    with consitant results. For all functions the sse specific code was somewhat slower (~20%)
    than the vectorized compiler code. Number of omp cores didn't matter, just made the results
    more measurable because of low execution times.
    
    So i removed all the sse specific code for less code burden and better performance.
    
    * Fix sse header plus div/0
    
    At least for bayer images we absolutely want to be sure there is no div by zero as there might
    be buggy dng files.

diff --git a/src/develop/imageop_math.c b/src/develop/imageop_math.c
index ef559652d..0066a83c9 100644
--- a/src/develop/imageop_math.c
+++ b/src/develop/imageop_math.c
@@ -18,14 +18,8 @@
 
 #include "develop/imageop_math.h"
 #include <assert.h> // for assert
-#ifdef __SSE__...
-#endif
 #include <glib.h> // for MIN, MAX, CLAMP, inline
 #include <math.h> // for round, floorf, fmaxf
-#ifdef __SSE__...
-#endif
 #include "common/darktable.h"        // for darktable, darktable_t, dt_code...
 #include "common/imageio.h"          // for FILTERS_ARE_4BAYER
 #include "common/interpolation.h"    // for dt_interpolation_new, dt_interp...
@@ -177,7 +171,7 @@ int dt_iop_clip_and_zoom_roi_cl(int devid, cl_mem dev_out, cl_mem dev_in, const
 
 #endif
 
-void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint16_t *const in,
+void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in,
                                                  const dt_iop_roi_t *const roi_out,
                                                  const dt_iop_roi_t *const roi_in, const int32_t out_stride,
                                                  const int32_t in_stride, const uint32_t filters)
@@ -244,224 +238,12 @@ void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint
             num++;
           }
         }
-      *outc = col / num;
-    }
-  }
-}
-
-#if defined(__SSE__)...
-#endif
-
-void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in,
-                                           const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
-                                           const int32_t out_stride, const int32_t in_stride,
-                                           const uint32_t filters)
-{
-  if(1)//(darktable.codepath.OPENMP_SIMD)
-    return dt_iop_clip_and_zoom_mosaic_half_size_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters);
-#if defined(__SSE__)
-  else if(darktable.codepath.SSE2)
-    return dt_iop_clip_and_zoom_mosaic_half_size_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
-#endif
-  else
-    dt_unreachable_codepath();
 }
 
-void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float *const in,
+void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in,
                                                    const dt_iop_roi_t *const roi_out,
                                                    const dt_iop_roi_t *const roi_in, const int32_t out_stride,
                                                    const int32_t in_stride, const uint32_t filters)
@@ -643,223 +425,10 @@ void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float
       }
 
       const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2));
-      *outc = col[c] / num;
-      outc++;
-    }
-  }
-}
-
-#if defined(__SSE__)...
-#endif
-
-void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in,
-                                             const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
-                                             const int32_t out_stride, const int32_t in_stride,
-                                             const uint32_t filters)
-{
-  if(darktable.codepath.OPENMP_SIMD)
-    return dt_iop_clip_and_zoom_mosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters);
-#if defined(__SSE__)
-  else if(darktable.codepath.SSE2)
-    return dt_iop_clip_and_zoom_mosaic_half_size_f_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
-#endif
-  else
-    dt_unreachable_codepath();
 }
 
 /**
@@ -951,7 +520,7 @@ void dt_iop_clip_and_zoom_mosaic_third_size_xtrans_f(float *const out, const flo
   }
 }
 
-void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, const float *const in,
+void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in,
                                                                   const dt_iop_roi_t *const roi_out,
                                                                   const dt_iop_roi_t *const roi_in,
                                                                   const int32_t out_stride,
@@ -1085,7 +654,7 @@ void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co
         num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
       }
 
-      const float pix = col / num;
+      const float pix = (num) ? col / num : 0.0f;
       outc[0] = pix;
       outc[1] = pix;
       outc[2] = pix;
@@ -1095,256 +664,7 @@ void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co
   }
 }
 
-#if defined(__SSE__)...
-#endif
-
-void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in,
-                                                            const dt_iop_roi_t *const roi_out,
-                                                            const dt_iop_roi_t *const roi_in,
-                                                            const int32_t out_stride, const int32_t in_stride)
-{
-  if(darktable.codepath.OPENMP_SIMD)
-    return dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(out, in, roi_out, roi_in, out_stride,
-                                                                        in_stride);
-#if defined(__SSE__)...
-#endif
-  else
-    dt_unreachable_codepath();
-}
-
-#if 0 // gets rid of pink artifacts, but doesn't do sub-pixel sampling, so shows some staircasing artifacts....
-#else
-// very fast and smooth, but doesn't handle highlights:
-
-void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *const in,
+void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in,
                                                      const dt_iop_roi_t *const roi_out,
                                                      const dt_iop_roi_t *const roi_in, const int32_t out_stride,
                                                      const int32_t in_stride, const uint32_t filters)
@@ -1522,202 +842,6 @@ void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *co
   }
 }
 
-#if defined(__SSE__)...
-#endif
-#endif
-
-void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in,
-                                               const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
-                                               const int32_t out_stride, const int32_t in_stride,
-                                               const uint32_t filters)
-{
-  if(darktable.codepath.OPENMP_SIMD)
-    return dt_iop_clip_and_zoom_demosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride,
-                                                           filters);
-#if defined(__SSE__)...
-#endif
-  else
-    dt_unreachable_codepath();
-}
     
     void dt_iop_clip_and_zoom_demosaic_third_size_xtrans_f(float *out, const float *const in,
                                                            const dt_iop_roi_t *const roi_out,
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<!-- Created by htmlize-1.55 in css mode. -->
<html>
  <head>
    <title>darktable.diff</title>
    <style type="text/css">
    <!--
      body {
        color: #93a1a1;
        background-color: #002b36;
      }
      .diff-added {
        /* diff-added */
        color: #98fb98;
      }
      .diff-context {
      }
      .diff-file-header {
        /* diff-file-header */
        background-color: #8b7500;
        font-weight: bold;
      }
      .diff-function {
        /* diff-function */
        background-color: #333333;
      }
      .diff-header {
        /* diff-header */
        background-color: #333333;
      }
      .diff-hunk-header {
        /* diff-hunk-header */
        background-color: #333333;
      }
      .diff-indicator-added {
        /* diff-indicator-added */
        color: #22aa22;
      }
      .diff-indicator-removed {
        /* diff-indicator-removed */
        color: #aa2222;
      }
      .diff-refine-added {
        /* diff-refine-added */
        background-color: #22aa22;
      }
      .diff-refine-removed {
        /* diff-refine-removed */
        background-color: #aa2222;
      }
      .diff-removed {
        /* diff-removed */
        color: #cd5555;
      }

      a {
        color: inherit;
        background-color: inherit;
        font: inherit;
        text-decoration: inherit;
      }
      a:hover {
        text-decoration: underline;
      }
    -->
    </style>
  </head>
  <body>
    <pre>
<span class="diff-context">commit f007e678d47f5662326824725cae2ab9e2455e66
Author: Hanno Schwalm <a href="mailto:hanno%40schwalm-bremen.de";>&lt;hanno@schwalm-bremen.de&gt;</a>
Date:   Fri May 14 18:20:37 2021 +0200

    Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size (#8954)
    
    * Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size_plain
    
    Fixes #8951
    
    Although the file given in the issue is crippled we can avoid the crash.
    In `dt_iop_clip_and_zoom_mosaic_half_size` and the sse friend there is possibly a div/0
    problem that should be checked.
    
    * Fixing same dib by zero in dt_iop_clip_and_zoom_mosaic_half_size_f
    
    * Remove sse code for dt_iop_clip_and_zoom_mosaic... after testing performance
    
    checked performance non-sse vs sse specific code
    - with added local timers
    - using gcc 10.2
    - testing -t 1/4/8/16
    - intel (xeon like 9900) with fixed clock rate
    
    in
    - dt_iop_clip_and_zoom_mosaic_half_size
    - dt_iop_clip_and_zoom_mosaic_half_size_f
    - dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f
    - dt_iop_clip_and_zoom_demosaic_half_size_f
    
    with consitant results. For all functions the sse specific code was somewhat slower (~20%)
    than the vectorized compiler code. Number of omp cores didn't matter, just made the results
    more measurable because of low execution times.
    
    So i removed all the sse specific code for less code burden and better performance.
    
    * Fix sse header plus div/0
    
    At least for bayer images we absolutely want to be sure there is no div by zero as there might
    be buggy dng files.
</span>
<span class="diff-header">diff --git a/src/develop/imageop_math.c b/src/develop/imageop_math.c
index ef559652d..0066a83c9 100644
--- </span><span class="diff-header"><span class="diff-file-header">a/src/develop/imageop_math.c</span></span><span class="diff-header">
+++ </span><span class="diff-header"><span class="diff-file-header">b/src/develop/imageop_math.c</span></span><span class="diff-header">
</span><span class="diff-hunk-header">@@ -18,14 +18,8 @@</span>
<span class="diff-context"> 
 #include "develop/imageop_math.h"
 #include &lt;assert.h&gt; // for assert
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#ifdef __SSE__...
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-context"> #include &lt;glib.h&gt; // for MIN, MAX, CLAMP, inline
 #include &lt;math.h&gt; // for round, floorf, fmaxf
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#ifdef __SSE__...
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-context"> #include "common/darktable.h"        // for darktable, darktable_t, dt_code...
 #include "common/imageio.h"          // for FILTERS_ARE_4BAYER
 #include "common/interpolation.h"    // for dt_interpolation_new, dt_interp...
</span><span class="diff-hunk-header">@@ -177,7 +171,7 @@</span><span class="diff-function"> int dt_iop_clip_and_zoom_roi_cl(int devid, cl_mem dev_out, cl_mem dev_in, const</span>
<span class="diff-context"> 
 #endif
 
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_mosaic_half_size</span><span class="diff-removed"><span class="diff-refine-removed">_plain</span></span><span class="diff-removed">(uint16_t *const out, const uint16_t *const in,
</span><span class="diff-indicator-added">+</span><span class="diff-added">void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in,
</span><span class="diff-context">                                                  const dt_iop_roi_t *const roi_out,
                                                  const dt_iop_roi_t *const roi_in, const int32_t out_stride,
                                                  const int32_t in_stride, const uint32_t filters)
</span><span class="diff-hunk-header">@@ -244,224 +238,12 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint</span>
<span class="diff-context">             num++;
           }
         }
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">      </span><span class="diff-removed"><span class="diff-refine-removed">*outc = col / num;
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">    }
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">  }
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">}
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#if defined(__SSE__)...</span></span><span class="diff-removed">
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                           const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                           const int32_t out_stride, const int32_t in_stride,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                           const uint32_t filters)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">{
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">  if(1)//(darktable.codepath.OPENMP_SIMD)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">    return dt_iop_clip_and_zoom_mosaic_half_size_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters);
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#if defined(__SSE__)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">  else if(darktable.codepath.SSE2)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">    return dt_iop_clip_and_zoom_mosaic_half_size_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">  else
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">    dt_unreachable_codepath();
</span><span class="diff-context"> }
 
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_mosaic_half_size_f</span><span class="diff-removed"><span class="diff-refine-removed">_plain</span></span><span class="diff-removed">(float *const out, const float *const in,
</span><span class="diff-indicator-added">+</span><span class="diff-added">void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in,
</span><span class="diff-context">                                                    const dt_iop_roi_t *const roi_out,
                                                    const dt_iop_roi_t *const roi_in, const int32_t out_stride,
                                                    const int32_t in_stride, const uint32_t filters)
</span><span class="diff-hunk-header">@@ -643,223 +425,10 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float</span>
<span class="diff-context">       }
 
       const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2));
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">      </span><span class="diff-removed"><span class="diff-refine-removed">*outc = col[c] / num;
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">      outc++;
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">    }
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">  }
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">}
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#if defined(__SSE__)...</span></span><span class="diff-removed">
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                             const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                             const int32_t out_stride, const int32_t in_stride,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                             const uint32_t filters)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">{
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">  if(darktable.codepath.OPENMP_SIMD)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">    return dt_iop_clip_and_zoom_mosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters);
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#if defined(__SSE__)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">  else if(darktable.codepath.SSE2)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">    return dt_iop_clip_and_zoom_mosaic_half_size_f_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters);
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">  else
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">    dt_unreachable_codepath();
</span><span class="diff-context"> }
 
 /**
</span><span class="diff-hunk-header">@@ -951,7 +520,7 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_mosaic_third_size_xtrans_f(float *const out, const flo</span>
<span class="diff-context">   }
 }
 
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f</span><span class="diff-removed"><span class="diff-refine-removed">_plain</span></span><span class="diff-removed">(float *out, const float *const in,
</span><span class="diff-indicator-added">+</span><span class="diff-added">void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in,
</span><span class="diff-context">                                                                   const dt_iop_roi_t *const roi_out,
                                                                   const dt_iop_roi_t *const roi_in,
                                                                   const int32_t out_stride,
</span><span class="diff-hunk-header">@@ -1085,7 +654,7 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co</span>
<span class="diff-context">         num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy);
       }
 
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">      const float pix = col / num;
</span><span class="diff-indicator-added">+</span><span class="diff-added">      const float pix = </span><span class="diff-added"><span class="diff-refine-added">(num) ?</span></span><span class="diff-added"> col / num </span><span class="diff-added"><span class="diff-refine-added">: 0.0f</span></span><span class="diff-added">;
</span><span class="diff-context">       outc[0] = pix;
       outc[1] = pix;
       outc[2] = pix;
</span><span class="diff-hunk-header">@@ -1095,256 +664,7 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co</span>
<span class="diff-context">   }
 }
 
</span><span class="diff-indicator-removed">-</span><span class="diff-removed"><span class="diff-refine-removed">#if defined(__SSE__)...
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#endif
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed">void dt_iop_clip_and_zoom_demosaic_</span><span class="diff-removed"><span class="diff-refine-removed">passthrough_monochrome_f(float *out, const float *const in,
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">                                                            const dt_iop_roi_t *const roi_out,
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">                                                            const dt_iop_roi_t *const roi_in,
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">                                                            const int32_t out_stride, const int32_t in_stride)
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">{
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">  if(darktable.codepath.OPENMP_SIMD)
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">    return dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(out, in, roi_out, roi_in, out_stride,
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">                                                                        in_stride);
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#if defined(__SSE__)...
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#endif
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">  else
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">    dt_unreachable_codepath();
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">}
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#if 0 // gets rid of pink artifacts, but doesn't do sub-pixel sampling, so shows some staircasing artifacts....
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#else
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">// very fast and smooth, but doesn't handle highlights:
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">
</span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">void dt_iop_clip_and_zoom_demosaic_half_size_f_plain</span></span><span class="diff-removed">(float *out, const float *const in,
</span><span class="diff-indicator-added">+</span><span class="diff-added">void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in,
</span><span class="diff-context">                                                      const dt_iop_roi_t *const roi_out,
                                                      const dt_iop_roi_t *const roi_in, const int32_t out_stride,
                                                      const int32_t in_stride, const uint32_t filters)
</span><span class="diff-hunk-header">@@ -1522,202 +842,6 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *co</span>
<span class="diff-context">   }
 }
 
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#if defined(__SSE__)...
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                               const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                               const int32_t out_stride, const int32_t in_stride,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                               const uint32_t filters)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">{
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">  if(darktable.codepath.OPENMP_SIMD)
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">    return dt_iop_clip_and_zoom_demosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride,
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">                                                           filters);
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#if defined(__SSE__)...
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">  else
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">    dt_unreachable_codepath();
</span><span class="diff-indicator-removed">-</span><span class="diff-removed">}
</span><span class="diff-context"> 
 void dt_iop_clip_and_zoom_demosaic_third_size_xtrans_f(float *out, const float *const in,
                                                        const dt_iop_roi_t *const roi_out,
</span></pre>
  </body>
</html>

--- End Message ---
--- Begin Message ---
Unblocked.

--- End Message ---

Reply to: