Bug#861248: marked as done (unblock: mlucas/14.1-2)

To: Niels Thykier <niels@thykier.net>
Subject: Bug#861248: marked as done (unblock: mlucas/14.1-2)
From: owner@bugs.debian.org (Debian Bug Tracking System)
Date: Wed, 26 Apr 2017 16:39:15 +0000
Message-id: <[🔎] handler.861248.D861248.149322473020748.ackdone@bugs.debian.org>
References: <c53d6661-2264-404f-059d-0d49a2d0a60e@thykier.net> <[🔎] 87y3unmed1.fsf@gmail.com>

Your message dated Wed, 26 Apr 2017 16:37:00 +0000
with message-id <c53d6661-2264-404f-059d-0d49a2d0a60e@thykier.net>
and subject line Re: Bug#861248: unblock: mlucas/14.1-2
has caused the Debian Bug report #861248,
regarding unblock: mlucas/14.1-2
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
861248: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861248
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems

--- Begin Message ---

To: submit@bugs.debian.org
Subject: unblock: mlucas/14.1-2
From: Alex Vong <alexvong1995@gmail.com>
Date: Wed, 26 Apr 2017 22:29:30 +0800
Message-id: <[🔎] 87y3unmed1.fsf@gmail.com>

Package: release.debian.org
Severity: normal
User: release.debian.org@packages.debian.org
Usertags: unblock

Hello release team,

Please unblock package mlucas.

This upload should fix the RC bug
<http://bugs.debian.org/cgi-bin/860662> by splitting big test into
smaller ones.

The diff attached below:

diff -Nru mlucas-14.1/debian/changelog mlucas-14.1/debian/changelog
--- mlucas-14.1/debian/changelog	2015-08-27 22:42:36.000000000 +0800
+++ mlucas-14.1/debian/changelog	2017-04-24 16:16:28.000000000 +0800
@@ -1,3 +1,11 @@
+mlucas (14.1-2) unstable; urgency=medium
+
+  * RC bug fix release (Closes: #860662), split big test into smaller ones
+    to avoid exhausting system resources.
+  * Backport fix for undefined behavior from upstream.
+
+ -- Alex Vong <alexvong1995@gmail.com>  Mon, 24 Apr 2017 16:16:28 +0800
+
 mlucas (14.1-1) unstable; urgency=low
 
   * Initial release (Closes: #786656)
diff -Nru mlucas-14.1/debian/patches/0001-fixes-undefined-behaviour.patch mlucas-14.1/debian/patches/0001-fixes-undefined-behaviour.patch
--- mlucas-14.1/debian/patches/0001-fixes-undefined-behaviour.patch	1970-01-01 08:00:00.000000000 +0800
+++ mlucas-14.1/debian/patches/0001-fixes-undefined-behaviour.patch	2017-04-24 16:16:28.000000000 +0800
@@ -0,0 +1,657 @@
+From f4c2fb2f7f771bf696d277140d267f6f03577f49 Mon Sep 17 00:00:00 2001
+From: Alex Vong <alexvong1995@gmail.com>
+Date: Wed, 27 Jul 2016 19:52:35 +0800
+Subject: [PATCH] Fixes undefined behaviour.
+
+Description: This fixes undefined behaviour (array out out bound) in
+ the fermat test code reported by gcc's
+ `-Waggressive-loop-optimizations'.
+Forwarded: yes
+Author: Ernst W. Mayer <ewmayer@aol.com>
+
+* src/radix1008_main_carry_loop.h: Fix undefined behaviour.
+* src/radix1024_main_carry_loop.h: Likewise.
+* src/radix128_main_carry_loop.h: Likewise.
+* src/radix224_main_carry_loop.h: Likewise.
+* src/radix240_main_carry_loop.h: Likewise.
+* src/radix256_main_carry_loop.h: Likewise.
+* src/radix32_main_carry_loop.h: Likewise.
+* src/radix4032_main_carry_loop.h: Likewise.
+* src/radix56_main_carry_loop.h: Likewise.
+* src/radix60_main_carry_loop.h: Likewise.
+* src/radix64_main_carry_loop.h: Likewise.
+* src/radix960_main_carry_loop.h: Likewise.
+---
+ src/radix1008_main_carry_loop.h | 21 ++++++++++-----------
+ src/radix1024_main_carry_loop.h |  6 +++---
+ src/radix128_main_carry_loop.h  | 10 +++++-----
+ src/radix224_main_carry_loop.h  | 17 ++++++++---------
+ src/radix240_main_carry_loop.h  | 19 ++++++++++---------
+ src/radix256_main_carry_loop.h  | 10 +++++-----
+ src/radix32_main_carry_loop.h   |  6 +++---
+ src/radix4032_main_carry_loop.h | 21 ++++++++++-----------
+ src/radix56_main_carry_loop.h   | 10 +++++-----
+ src/radix60_main_carry_loop.h   | 12 ++++++------
+ src/radix64_main_carry_loop.h   |  6 +++---
+ src/radix960_main_carry_loop.h  | 22 ++++++++++++++--------
+ 12 files changed, 82 insertions(+), 78 deletions(-)
+
+diff --git a/src/radix1008_main_carry_loop.h b/src/radix1008_main_carry_loop.h
+index 25cdc2c..525d29c 100644
+--- a/src/radix1008_main_carry_loop.h
++++ b/src/radix1008_main_carry_loop.h
+@@ -389,14 +389,14 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 		// icycle[ic],icycle[ic+1],icycle[ic+2],icycle[ic+3], jcycle[ic],kcycle[ic],lcycle[ic] of the non-looped version with
+ 		// icycle[ic],icycle[jc],icycle[kc],icycle[lc], jcycle[ic],kcycle[ic],lcycle[ic] :
+ 		ic = 0; jc = 1; kc = 2; lc = 3;
+-		while(tm0 < isrt2,two)	// Can't use l for loop index here since need it for byte offset in carry macro call
++		while(tm0 < two)	// Can't use l for loop index here since need it for byte offset in carry macro call
+ 		{																/* vvvvvvvvvvvvvvv [1,2,3]*ODD_RADIX; assumed << l2_sz_vd on input: */
+ 			//See "Sep 2014" note in 32-bit SSE2 version of this code below
+ 			k1 = icycle[ic];	k5 = jcycle[ic];	k6 = kcycle[ic];	k7 = lcycle[ic];
+ 			k2 = icycle[jc];
+ 			k3 = icycle[kc];
+ 			k4 = icycle[lc];
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+ 																		/* vvvvvvvvvvvvvvv [1,2,3]*ODD_RADIX; assumed << l2_sz_vd on input: */
+ 			SSE2_fermat_carry_norm_errcheck_X4_hiacc(tm0,tmp,l,tm1,0x1f80, 0x7e0,0xfc0,0x17a0, half_arr,sign_mask,k1,k2,k3,k4,k5,k6,k7, add0,p1,p2,p3);
+ 			tm0 += 8; tm1++; tmp += 8; l -= 0xc0;
+@@ -417,7 +417,7 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 			k2 = icycle[jc];
+ 			k3 = icycle[kc];
+ 			k4 = icycle[lc];
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+ 																		/* vvvvvvvvvvvvvvv [1,2,3]*ODD_RADIX; assumed << l2_sz_vd on input: */
+ 			SSE2_fermat_carry_norm_errcheck_X4_loacc(tm0,tmp,tm1,0x1f80, 0x7e0,0xfc0,0x17a0, half_arr,sign_mask,k1,k2,k3,k4,k5,k6,k7, add0,p1,p2,p3);
+ 			tm0 += 8; tm1++;
+@@ -447,15 +447,15 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 		ic = 0; jc = 1;
+ 		tm1 = s1p00; tmp = cy_r;	// <*** Again rely on contiguity of cy_r,i here ***
+ 		l = ODD_RADIX;	// Need to stick this #def into an intvar to work around [error: invalid lvalue in asm input for constraint 'm']
+-		while(tm1 < isrt2) {
++		while((int)(tmp-cy_r) < RADIX) {
+ 			//See "Sep 2014" note in 32-bit SSE2 version of this code below
+ 			k1 = icycle[ic];
+ 			k2 = jcycle[ic];
+ 			k3 = icycle[jc];
+ 			k4 = jcycle[jc];
+ 			// Each SSE2 carry macro call also processes 2 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+-			tm2 += (-((int)(tm1-cy_r)&0x1)) & p2;	// Base-addr incr by extra p2 on odd-index passes
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tmp-cy_r)>>2]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 += (-((int)((tmp-cy_r)>>1)&0x1)) & p2;	// Base-addr incr by extra p2 on odd-index passes
+ 			SSE2_fermat_carry_norm_errcheck_X2(tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,l,half_arr,sign_mask,add1,add2,k1,k2,k3,k4, tm2,p1);
+ 			tm1 += 4; tmp += 2;
+ 			MOD_ADD32(ic, 2, ODD_RADIX, ic);
+@@ -468,16 +468,15 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 		ic = 0;	// ic = idx into [i|j]cycle mini-arrays, gets incremented (mod ODD_RADIX) between macro calls
+ 		tm1 = s1p00; tmp = cy_r;	// <*** Again rely on contiguity of cy_r,i here ***
+ 		l = ODD_RADIX << 4;	// 32-bit version needs preshifted << 4 input value
+-		while(tm1 < isrt2) {
++		while((int)(tmp-cy_r) < RADIX) {
+ 			//Sep 2014: Even with reduced-register version of the 32-bit Fermat-mod carry macro,
+ 			// GCC runs out of registers on this one, without some playing-around-with-alternate code-sequences ...
+ 			// Pulling the array-refs out of the carry-macro call like so solves the problem:
+ 			k1 = icycle[ic];
+ 			k2 = jcycle[ic];
+-			// Each SSE2 carry macro call also processes 2 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+-			tm2 += (-(l&0x10)) & p2;
+-			tm2 += (-(l&0x01)) & p1;	// Added offset cycles among p0,1,2,3
++			// Each SSE2 carry macro call also processes 1 prefetch of main-array data
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tmp-cy_r)>>2]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 += p1*((int)(tmp-cy_r)&0x3);	// Added offset cycles among p0,1,2,3
+ 			SSE2_fermat_carry_norm_errcheck(tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,l,half_arr,sign_mask,add1,add2,k1,k2, tm2);
+ 			tm1 += 2; tmp++;
+ 			MOD_ADD32(ic, 1, ODD_RADIX, ic);
+diff --git a/src/radix1024_main_carry_loop.h b/src/radix1024_main_carry_loop.h
+index 6b2e8ae..d43b47c 100644
+--- a/src/radix1024_main_carry_loop.h
++++ b/src/radix1024_main_carry_loop.h
+@@ -384,7 +384,7 @@ normally be getting dispatched to [radix] separate blocks of the A-array, we nee
+ 	  #if (OS_BITS == 32)
+ 		for(l = 0; l < RADIX; l++) {	// RADIX loop passes
+ 			// Each SSE2 carry macro call also processes 1 prefetch of main-array data
+-			add0 = a + j1 + pfetch_dist + poff[l];	// poff[] = p0,4,8,...
++			add0 = a + j1 + pfetch_dist + poff[l>>2];	// poff[] = p0,4,8,...
+ 			add0 += (-(l&0x10)) & p2;
+ 			add0 += (-(l&0x01)) & p1;
+ 			SSE2_fermat_carry_norm_pow2_errcheck   (tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,half_arr,sign_mask,add1,add2, add0);
+@@ -393,7 +393,7 @@ normally be getting dispatched to [radix] separate blocks of the A-array, we nee
+ 	  #else	// 64-bit SSE2
+ 		for(l = 0; l < RADIX>>1; l++) {	// RADIX/2 loop passes
+ 			// Each SSE2 carry macro call also processes 2 prefetches of main-array data
+-			add0 = a + j1 + pfetch_dist + poff[l];	// poff[] = p0,4,8,...
++			add0 = a + j1 + pfetch_dist + poff[l>>1];	// poff[] = p0,4,8,...
+ 			add0 += (-(l&0x1)) & p2;	// Base-addr incr by extra p2 on odd-index passes
+ 			SSE2_fermat_carry_norm_pow2_errcheck_X2(tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,half_arr,sign_mask,add1,add2, add0,p2);
+ 			tm1 += 4; tmp += 2;
+@@ -427,7 +427,7 @@ normally be getting dispatched to [radix] separate blocks of the A-array, we nee
+ 			SSE2_RADIX_64_DIF( FALSE, thr_id,
+ 				4,	// set = trailz(N) - trailz(64)
+ 				// Input pointer; no offsets array in pow2-radix case:
+-				s1p00 + (jt<<1), 0x0,
++				(double *)(s1p00 + (jt<<1)), 0x0,
+ 				// Intermediates-storage pointer:
+ 				vd00,
+ 				// Outputs: Base address plus index offsets:
+diff --git a/src/radix128_main_carry_loop.h b/src/radix128_main_carry_loop.h
+index ff92238..24cb836 100644
+--- a/src/radix128_main_carry_loop.h
++++ b/src/radix128_main_carry_loop.h
+@@ -571,7 +571,7 @@ normally be getting dispatched to [radix] separate blocks of the A-array, we nee
+ 	  #if (OS_BITS == 32)
+ 		for(l = 0; l < RADIX; l++) {	// RADIX loop passes
+ 			// Each SSE2 carry macro call also processes 1 prefetch of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[l];	// poff[] = p0,4,8,...
++			tm2 = (vec_dbl *)( + pfetch_dist + poff[l>>2]);	// poff[] = p0,4,8,...
+ 			tm2 += (-(l&0x10)) & p02;
+ 			tm2 += (-(l&0x01)) & p01;
+ 			SSE2_fermat_carry_norm_pow2_errcheck   (tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,half_arr,sign_mask,add1,add2, tm2);
+@@ -580,7 +580,7 @@ normally be getting dispatched to [radix] separate blocks of the A-array, we nee
+ 	  #else	// 64-bit SSE2
+ 		for(l = 0; l < RADIX>>1; l++) {	// RADIX/2 loop passes
+ 			// Each SSE2 carry macro call also processes 2 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[l];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[l>>1]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+ 			tm2 += (-(l&0x1)) & p02;	// Base-addr incr by extra p2 on odd-index passes
+ 			SSE2_fermat_carry_norm_pow2_errcheck_X2(tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,half_arr,sign_mask,add1,add2, tm2,p01);
+ 			tm1 += 4; tmp += 2;
+@@ -592,7 +592,7 @@ normally be getting dispatched to [radix] separate blocks of the A-array, we nee
+ 		// Can't use l as loop index here, since it gets used in the Fermat-mod carry macro (as are k1,k2);
+ 		ntmp = 0; addr = cy_r; addi = cy_i;
+ 		for(m = 0; m < RADIX>>2; m++) {
+-			jt = j1 + poff[m]; jp = j2 + poff[m];	// poff[] = p04,08,...,60
++			jt = j1 + poff[m]; jp = j2 + poff[m];	// poff[] = p04,08,...
+ 			fermat_carry_norm_pow2_errcheck(a[jt    ],a[jp    ],*addr,*addi,ntmp,NRTM1,NRT_BITS);	ntmp += NDIVR; ++addr; ++addi;
+ 			fermat_carry_norm_pow2_errcheck(a[jt+p01],a[jp+p01],*addr,*addi,ntmp,NRTM1,NRT_BITS);	ntmp += NDIVR; ++addr; ++addi;
+ 			fermat_carry_norm_pow2_errcheck(a[jt+p02],a[jp+p02],*addr,*addi,ntmp,NRTM1,NRT_BITS);	ntmp += NDIVR; ++addr; ++addi;
+@@ -634,8 +634,8 @@ normally be getting dispatched to [radix] separate blocks of the A-array, we nee
+ 		k1 = reverse(l,8)<<1;
+ 		tm2 = s1p00 + k1;
+ 	#if (OS_BITS == 32)
+-								 add1 = (vec_dbl*)tm1+ 2; add2 = (vec_dbl*)tm1+ 4; add3 = (vec_dbl*)tm1+ 6; add4 = (vec_dbl*)tm1+ 8; add5 = (vec_dbl*)tm1+10; add6 = (vec_dbl*)tm1+12; add7 = (vec_dbl*)tm1+14;
+-		add8 = (vec_dbl*)tm1+16; add9 = (vec_dbl*)tm1+18; adda = (vec_dbl*)tm1+20; addb = (vec_dbl*)tm1+22; addc = (vec_dbl*)tm1+24; addd = (vec_dbl*)tm1+26; adde = (vec_dbl*)tm1+28; addf = (vec_dbl*)tm1+30;
++								  add1 = (double*)(tm1+ 2); add2 = (double*)(tm1+ 4); add3 = (double*)(tm1+ 6); add4 = (double*)(tm1+ 8); add5 = (double*)(tm1+10); add6 = (double*)(tm1+12); add7 = (double*)(tm1+14);
++		add8 = (double*)(tm1+16); add9 = (double*)(tm1+18); adda = (double*)(tm1+20); addb = (double*)(tm1+22); addc = (double*)(tm1+24); addd = (double*)(tm1+26); adde = (double*)(tm1+28); addf = (double*)(tm1+30);
+ 		SSE2_RADIX16_DIF_0TWIDDLE  (tm2,OFF1,OFF2,OFF3,OFF4, tmp,two, tm1,add1,add2,add3,add4,add5,add6,add7,add8,add9,adda,addb,addc,addd,adde,addf);
+ 	#else
+ 		SSE2_RADIX16_DIF_0TWIDDLE_B(tm2,OFF1,OFF2,OFF3,OFF4, tmp,two, tm1);
+diff --git a/src/radix224_main_carry_loop.h b/src/radix224_main_carry_loop.h
+index 1ad55e5..ead8a83 100644
+--- a/src/radix224_main_carry_loop.h
++++ b/src/radix224_main_carry_loop.h
+@@ -398,7 +398,7 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 			k3 = icycle[kc];
+ 			k4 = icycle[lc];
+ 			// Each AVX carry macro call also processes 4 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+ 																		/* vvvvvvvvvvvvvvv [1,2,3]*ODD_RADIX; assumed << l2_sz_vd on input: */
+ 			SSE2_fermat_carry_norm_errcheck_X4_hiacc(tm0,tmp,l,tm1,0x700, 0xe0,0x1c0,0x2a0, half_arr,sign_mask,k1,k2,k3,k4,k5,k6,k7, tm2,p1,p2,p3);
+ 			tm0 += 8; tm1++; tmp += 8; l -= 0xc0;
+@@ -420,7 +420,7 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 			k3 = icycle[kc];
+ 			k4 = icycle[lc];
+ 			// Each AVX carry macro call also processes 4 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+ 																		/* vvvvvvvvvvvvvvv [1,2,3]*ODD_RADIX; assumed << l2_sz_vd on input: */
+ 			SSE2_fermat_carry_norm_errcheck_X4_loacc(tm0,tmp,tm1,0x700, 0xe0,0x1c0,0x2a0, half_arr,sign_mask,k1,k2,k3,k4,k5,k6,k7, tm2,p1,p2,p3);
+ 			tm0 += 8; tm1++;
+@@ -448,15 +448,15 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 		ic = 0; jc = 1;
+ 		tm1 = s1p00; tmp = cy_r;	// <*** Again rely on contiguity of cy_r,i here ***
+ 		l = ODD_RADIX;	// Need to stick this #def into an intvar to work around [error: invalid lvalue in asm input for constraint 'm']
+-		while(tm1 < isrt2) {
++		while((int)(tmp-cy_r) < RADIX) {
+ 			//See "Sep 2014" note in 32-bit SSE2 version of this code below
+ 			k1 = icycle[ic];
+ 			k2 = jcycle[ic];
+ 			int k3 = icycle[jc];
+ 			int k4 = jcycle[jc];
+ 			// Each SSE2 carry macro call also processes 2 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+-			tm2 += (-((int)(tm1-cy_r)&0x1)) & p2;	// Base-addr incr by extra p2 on odd-index passes
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tmp-cy_r)>>2]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 += (-((int)((tmp-cy_r)>>1)&0x1)) & p2;	// Base-addr incr by extra p2 on odd-index passes
+ 			SSE2_fermat_carry_norm_errcheck_X2(tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,l,half_arr,sign_mask,add1,add2,k1,k2,k3,k4, tm2,p1);
+ 			tm1 += 4; tmp += 2;
+ 			MOD_ADD32(ic, 2, ODD_RADIX, ic);
+@@ -470,16 +470,15 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 		tm1 = s1p00; tmp = cy_r;	// <*** Again rely on contiguity of cy_r,i here ***
+ 		// Need to stick this #def into an intvar to work around [error: invalid lvalue in asm input for constraint 'm']
+ 		l = ODD_RADIX << 4;	// 32-bit version needs preshifted << 4 input value
+-		while(tm1 < isrt2) {
++		while((int)(tmp-cy_r) < RADIX) {
+ 			//Sep 2014: Even with reduced-register version of the 32-bit Fermat-mod carry macro,
+ 			// GCC runs out of registers on this one, without some playing-around-with-alternate code-sequences ...
+ 			// Pulling the array-refs out of the carry-macro call like so solves the problem:
+ 			k1 = icycle[ic];
+ 			k2 = jcycle[ic];
+ 			// Each SSE2 carry macro call also processes 2 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+-			tm2 += (-(l&0x10)) & p2;
+-			tm2 += (-(l&0x01)) & p1;	// Added offset cycles among p0,1,2,3
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tmp-cy_r)>>2]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 += p1*((int)(tmp-cy_r)&0x3);	// Added offset cycles among p0,1,2,3
+ 			SSE2_fermat_carry_norm_errcheck(tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,l,half_arr,sign_mask,add1,add2,k1,k2, tm2);
+ 			tm1 += 2; tmp++;
+ 			MOD_ADD32(ic, 1, ODD_RADIX, ic);
+diff --git a/src/radix240_main_carry_loop.h b/src/radix240_main_carry_loop.h
+index 2278d29..6f8e0f4 100644
+--- a/src/radix240_main_carry_loop.h
++++ b/src/radix240_main_carry_loop.h
+@@ -608,14 +608,15 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 		// icycle[ic],icycle[jc],icycle[kc],icycle[lc], jcycle[ic],kcycle[ic],lcycle[ic] :
+ 		ic = 0; jc = 1; kc = 2; lc = 3;
+ 		while(tm0 < s1pef)	// Can't use l for loop index here since need it for byte offset in carry macro call
+-		{
++		{	// NB: (int)(tmp-cy_r) < RADIX (as used for SSE2 build) no good here, since just 1 vec_dbl increment
++			// per 4 Re+Im-carries; but (int)(tmp-cy_r) < (RADIX>>1) would work
+ 			//See "Sep 2014" note in 32-bit SSE2 version of this code below
+ 			k1 = icycle[ic];	k5 = jcycle[ic];	k6 = kcycle[ic];	k7 = lcycle[ic];
+ 			k2 = icycle[jc];
+ 			k3 = icycle[kc];
+ 			k4 = icycle[lc];
+ 			// Each AVX carry macro call also processes 4 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+ 																		/* vvvvvvvvvvvvvvv [1,2,3]*ODD_RADIX; assumed << l2_sz_vd on input: */
+ 			SSE2_fermat_carry_norm_errcheck_X4_hiacc(tm0,tmp,l,tm1,0x780, 0x1e0,0x3c0,0x5a0, half_arr,sign_mask,k1,k2,k3,k4,k5,k6,k7, tm2,p1,p2,p3);
+ 			tm0 += 8; tm1++; tmp += 8; l -= 0xc0;
+@@ -691,7 +692,7 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 				k3 = icycle[kc];
+ 				k4 = icycle[lc];
+ 				// Each AVX carry macro call also processes 4 prefetches of main-array data
+-				tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++				tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+ 																			/* vvvvvvvvvvvvvvv [1,2,3]*ODD_RADIX; assumed << l2_sz_vd on input: */
+ 				SSE2_fermat_carry_norm_errcheck_X4_loacc(tm0,tmp,tm1,0x780, 0x1e0,0x3c0,0x5a0, half_arr,sign_mask,k1,k2,k3,k4,k5,k6,k7, tm2,p1,p2,p3);
+ 				tm0 += 8; tm1++;
+@@ -722,15 +723,15 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 		ic = 0; jc = 1;
+ 		tm1 = s1p00; tmp = cy_r;	// <*** Again rely on contiguity of cy_r,i here ***
+ 		l = ODD_RADIX;	// Need to stick this #def into an intvar to work around [error: invalid lvalue in asm input for constraint 'm']
+-		while(tm1 < s1pef) {
++		while((int)(tmp-cy_r) < RADIX) {
+ 			//See "Sep 2014" note in 32-bit SSE2 version of this code below
+ 			k1 = icycle[ic];
+ 			k2 = jcycle[ic];
+ 			int k3 = icycle[jc];
+ 			int k4 = jcycle[jc];
+ 			// Each SSE2 carry macro call also processes 2 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+-			tm2 += (-((int)(tm1-cy_r)&0x1)) & p2;	// Base-addr incr by extra p2 on odd-index passes
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tmp-cy_r)>>2]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 += (-((int)((tmp-cy_r)>>1)&0x1)) & p2;	// Base-addr incr by extra p2 on odd-index passes
+ 			SSE2_fermat_carry_norm_errcheck_X2(tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,l,half_arr,sign_mask,add1,add2,k1,k2,k3,k4, tm2,p1);
+ 			tm1 += 4; tmp += 2;
+ 			MOD_ADD32(ic, 2, ODD_RADIX, ic);
+@@ -744,15 +745,15 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 		tm1 = s1p00; tmp = cy_r;	// <*** Again rely on contiguity of cy_r,i here ***
+ 		// Need to stick this #def into an intvar to work around [error: invalid lvalue in asm input for constraint 'm']
+ 		l = ODD_RADIX << 4;	// 32-bit version needs preshifted << 4 input value
+-		while(tm1 <= s1pef) {
++		while((int)(tmp-cy_r) < RADIX) {
+ 			//Sep 2014: Even with reduced-register version of the 32-bit Fermat-mod carry macro,
+ 			// GCC runs out of registers on this one, without some playing-around-with-alternate code-sequences ...
+ 			// Pulling the array-refs out of the carry-macro call like so solves the problem:
+ 			k1 = icycle[ic];
+ 			k2 = jcycle[ic];
+ 			// Each SSE2 carry macro call also processes 2 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+-			tm2 += plo[(int)(tm1-cy_r)&0x3];	// Added offset cycles among p0,1,2,3
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tmp-cy_r)>>2]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 += p1*((int)(tmp-cy_r)&0x3);	// Added offset cycles among p0,1,2,3
+ 			SSE2_fermat_carry_norm_errcheck(tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,l,half_arr,sign_mask,add1,add2,k1,k2, tm2);
+ 			tm1 += 2; tmp++;
+ 			MOD_ADD32(ic, 1, ODD_RADIX, ic);
+diff --git a/src/radix256_main_carry_loop.h b/src/radix256_main_carry_loop.h
+index d439f24..aff7f38 100644
+--- a/src/radix256_main_carry_loop.h
++++ b/src/radix256_main_carry_loop.h
+@@ -558,7 +558,7 @@ normally be getting dispatched to [radix] separate blocks of the A-array, we nee
+ 	  #if (OS_BITS == 32)
+ 		for(l = 0; l < RADIX; l++) {	// RADIX loop passes
+ 			// Each SSE2 carry macro call also processes 1 prefetch of main-array data
+-			add0 = a + j1 + pfetch_dist + poff[l];	// poff[] = p0,4,8,...
++			add0 = a + j1 + pfetch_dist + poff[l>>2];	// poff[] = p0,4,8,...
+ 			add0 += (-(l&0x10)) & p02;
+ 			add0 += (-(l&0x01)) & p01;
+ 			SSE2_fermat_carry_norm_pow2_errcheck   (tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,half_arr,sign_mask,add1,add2, add0);
+@@ -567,7 +567,7 @@ normally be getting dispatched to [radix] separate blocks of the A-array, we nee
+ 	  #else	// 64-bit SSE2
+ 		for(l = 0; l < RADIX>>1; l++) {	// RADIX/2 loop passes
+ 			// Each SSE2 carry macro call also processes 2 prefetches of main-array data
+-			add0 = a + j1 + pfetch_dist + poff[l];	// poff[] = p0,4,8,...
++			add0 = a + j1 + pfetch_dist + poff[l>>1];	// poff[] = p0,4,8,...
+ 			add0 += (-(l&0x1)) & p02;	// Base-addr incr by extra p2 on odd-index passes
+ 			SSE2_fermat_carry_norm_pow2_errcheck_X2(tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,half_arr,sign_mask,add1,add2, add0,p01);
+ 			tm1 += 4; tmp += 2;
+@@ -579,7 +579,7 @@ normally be getting dispatched to [radix] separate blocks of the A-array, we nee
+ 		// Can't use l as loop index here, since it gets used in the Fermat-mod carry macro (as are k1,k2):
+ 		ntmp = 0; addr = cy_r; addi = cy_i;
+ 		for(m = 0; m < RADIX>>2; m++) {
+-			jt = j1 + poff[m]; jp = j2 + poff[m];	// poff[] = p04,08,...,60
++			jt = j1 + poff[m]; jp = j2 + poff[m];	// poff[] = p04,08,...
+ 			fermat_carry_norm_pow2_errcheck(a[jt    ],a[jp    ],*addr,*addi,ntmp,NRTM1,NRT_BITS);	ntmp += NDIVR; ++addr; ++addi;
+ 			fermat_carry_norm_pow2_errcheck(a[jt+p01],a[jp+p01],*addr,*addi,ntmp,NRTM1,NRT_BITS);	ntmp += NDIVR; ++addr; ++addi;
+ 			fermat_carry_norm_pow2_errcheck(a[jt+p02],a[jp+p02],*addr,*addi,ntmp,NRTM1,NRT_BITS);	ntmp += NDIVR; ++addr; ++addi;
+@@ -629,8 +629,8 @@ normally be getting dispatched to [radix] separate blocks of the A-array, we nee
+ 		k1 = reverse(l,16)<<1;
+ 		tm2 = s1p00 + k1;
+ 	#if (OS_BITS == 32)
+-								 add1 = (vec_dbl*)tmp+ 2; add2 = (vec_dbl*)tmp+ 4; add3 = (vec_dbl*)tmp+ 6; add4 = (vec_dbl*)tmp+ 8; add5 = (vec_dbl*)tmp+10; add6 = (vec_dbl*)tmp+12; add7 = (vec_dbl*)tmp+14;
+-		add8 = (vec_dbl*)tmp+16; add9 = (vec_dbl*)tmp+18; adda = (vec_dbl*)tmp+20; addb = (vec_dbl*)tmp+22; addc = (vec_dbl*)tmp+24; addd = (vec_dbl*)tmp+26; adde = (vec_dbl*)tmp+28; addf = (vec_dbl*)tmp+30;
++								  add1 = (double*)(tmp+ 2); add2 = (double*)(tmp+ 4); add3 = (double*)(tmp+ 6); add4 = (double*)(tmp+ 8); add5 = (double*)(tmp+10); add6 = (double*)(tmp+12); add7 = (double*)(tmp+14);
++		add8 = (double*)(tmp+16); add9 = (double*)(tmp+18); adda = (double*)(tmp+20); addb = (double*)(tmp+22); addc = (double*)(tmp+24); addd = (double*)(tmp+26); adde = (double*)(tmp+28); addf = (double*)(tmp+30);
+ 		SSE2_RADIX16_DIF_0TWIDDLE  (tm2,OFF1,OFF2,OFF3,OFF4, isrt2,two, tmp,add1,add2,add3,add4,add5,add6,add7,add8,add9,adda,addb,addc,addd,adde,addf);
+ 	#else
+ 		SSE2_RADIX16_DIF_0TWIDDLE_B(tm2,OFF1,OFF2,OFF3,OFF4, isrt2,two, tmp);
+diff --git a/src/radix32_main_carry_loop.h b/src/radix32_main_carry_loop.h
+index 5337009..3f0d0a0 100644
+--- a/src/radix32_main_carry_loop.h
++++ b/src/radix32_main_carry_loop.h
+@@ -291,7 +291,7 @@ normally be getting dispatched to [radix] separate blocks of the A-array, we nee
+ 	  #if (OS_BITS == 32)
+ 		for(l = 0; l < RADIX; l++) {	// RADIX loop passes
+ 			// Each SSE2 carry macro call also processes 1 prefetch of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[l];	// poff[] = p0,4,8,...
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[l>>2]);	// poff[] = p0,4,8,...
+ 			tm2 += (-(l&0x10)) & p02;
+ 			tm2 += (-(l&0x01)) & p01;
+ 			SSE2_fermat_carry_norm_pow2_errcheck   (tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,half_arr,sign_mask,add1,add2, tm2);
+@@ -300,7 +300,7 @@ normally be getting dispatched to [radix] separate blocks of the A-array, we nee
+ 	  #else	// 64-bit SSE2
+ 		for(l = 0; l < RADIX>>1; l++) {	// RADIX/2 loop passes
+ 			// Each SSE2 carry macro call also processes 2 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[l];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[l>>1]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+ 			tm2 += (-(l&0x1)) & p02;	// Base-addr incr by extra p2 on odd-index passes
+ 			SSE2_fermat_carry_norm_pow2_errcheck_X2(tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,half_arr,sign_mask,add1,add2, tm2,p01);
+ 			tm1 += 4; tmp += 2;
+@@ -312,7 +312,7 @@ normally be getting dispatched to [radix] separate blocks of the A-array, we nee
+ 		// Can't use l as loop index here, since it gets used in the Fermat-mod carry macro (as are k1,k2);
+ 		ntmp = 0; addr = cy_r; addi = cy_i;
+ 		for(m = 0; m < RADIX>>2; m++) {
+-			jt = j1 + poff[m]; jp = j2 + poff[m];	// poff[] = p04,08,...,60
++			jt = j1 + poff[m]; jp = j2 + poff[m];	// poff[] = p04,08,...
+ 			fermat_carry_norm_pow2_errcheck(a[jt    ],a[jp    ],*addr,*addi,ntmp,NRTM1,NRT_BITS);	ntmp += NDIVR; ++addr; ++addi;
+ 			fermat_carry_norm_pow2_errcheck(a[jt+p01],a[jp+p01],*addr,*addi,ntmp,NRTM1,NRT_BITS);	ntmp += NDIVR; ++addr; ++addi;
+ 			fermat_carry_norm_pow2_errcheck(a[jt+p02],a[jp+p02],*addr,*addi,ntmp,NRTM1,NRT_BITS);	ntmp += NDIVR; ++addr; ++addi;
+diff --git a/src/radix4032_main_carry_loop.h b/src/radix4032_main_carry_loop.h
+index 3e68bb2..ac02d50 100644
+--- a/src/radix4032_main_carry_loop.h
++++ b/src/radix4032_main_carry_loop.h
+@@ -371,7 +371,7 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 			k2 = icycle[jc];
+ 			k3 = icycle[kc];
+ 			k4 = icycle[lc];
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+ 																		/* vvvvvvvvvvvvvvv [1,2,3]*ODD_RADIX; assumed << l2_sz_vd on input: */
+ 			SSE2_fermat_carry_norm_errcheck_X4_hiacc(tm0,tmp,l,tm1,0x7e00, 0x1f80,0x3f00,0x5e80, half_arr,sign_mask,k1,k2,k3,k4,k5,k6,k7, tm2,p1,p2,p3);
+ 			tm0 += 8; tm1++; tmp += 8; l -= 0xc0;
+@@ -386,14 +386,13 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 		tm0 = s1p00; tmp = base_negacyclic_root;	// tmp *not* incremented between macro calls in loacc version
+ 		tm1 = cy_r; // tm2 = cy_i;	*** replace with literal-byte-offset in macro call to save a reg
+ 		ic = 0; jc = 1; kc = 2; lc = 3;
+-		for(l = 0; l < RADIX>>2; l++)	// RADIX/4 loop passes
+-		{
++		for(l = 0; l < RADIX>>2; l++) {	// RADIX/4 loop passes
+ 			//See "Sep 2014" note in 32-bit SSE2 version of this code below
+ 			k1 = icycle[ic];	k5 = jcycle[ic];	k6 = kcycle[ic];	k7 = lcycle[ic];
+ 			k2 = icycle[jc];
+ 			k3 = icycle[kc];
+ 			k4 = icycle[lc];
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+ 																		/* vvvvvvvvvvvvvvv [1,2,3]*ODD_RADIX; assumed << l2_sz_vd on input: */
+ 			SSE2_fermat_carry_norm_errcheck_X4_loacc(tm0,tmp,tm1,0x7e00, 0x1f80,0x3f00,0x5e80, half_arr,sign_mask,k1,k2,k3,k4,k5,k6,k7, tm2,p1,p2,p3);
+ 			tm0 += 8; tm1++;
+@@ -423,15 +422,15 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 		ic = 0; jc = 1;
+ 		tm1 = s1p00; tmp = cy_r;	// <*** Again rely on contiguity of cy_r,i here ***
+ 		l = ODD_RADIX;	// Need to stick this #def into an intvar to work around [error: invalid lvalue in asm input for constraint 'm']
+-		while(tm1 < cy_r) {
++		while((int)(tmp-cy_r) < RADIX) {
+ 			//See "Sep 2014" note in 32-bit SSE2 version of this code below
+ 			k1 = icycle[ic];
+ 			k2 = jcycle[ic];
+ 			int k3 = icycle[jc];
+ 			int k4 = jcycle[jc];
+ 			// Each SSE2 carry macro call also processes 2 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+-			tm2 += (-((int)(tm1-cy_r)&0x1)) & p2;	// Base-addr incr by extra p2 on odd-index passes
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tmp-cy_r)>>2]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 += (-((int)((tmp-cy_r)>>1)&0x1)) & p2;	// Base-addr incr by extra p2 on odd-index passes
+ 			SSE2_fermat_carry_norm_errcheck_X2(tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,l,half_arr,sign_mask,add1,add2,k1,k2,k3,k4, tm2,p1);
+ 			tm1 += 4; tmp += 2;
+ 			MOD_ADD32(ic, 2, ODD_RADIX, ic);
+@@ -444,15 +443,15 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 		ic = 0;	// ic = idx into [i|j]cycle mini-arrays, gets incremented (mod ODD_RADIX) between macro calls
+ 		tm1 = s1p00; tmp = cy_r;	// <*** Again rely on contiguity of cy_r,i here ***
+ 		l = ODD_RADIX << 4;	// 32-bit version needs preshifted << 4 input value
+-		while(tm1 < cy_r) {
++		while((int)(tmp-cy_r) < RADIX) {
+ 			//Sep 2014: Even with reduced-register version of the 32-bit Fermat-mod carry macro,
+ 			// GCC runs out of registers on this one, without some playing-around-with-alternate code-sequences ...
+ 			// Pulling the array-refs out of the carry-macro call like so solves the problem:
+ 			k1 = icycle[ic];
+ 			k2 = jcycle[ic];
+ 			// Each SSE2 carry macro call also processes 1 prefetch of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+-			tm2 += plo[(int)(tm1-cy_r)&0x3];	// Added offset cycles among p0,1,2,3
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tmp-cy_r)>>2]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 += p1*((int)(tmp-cy_r)&0x3);	// Added offset cycles among p0,1,2,3
+ 			SSE2_fermat_carry_norm_errcheck(tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,l,half_arr,sign_mask,add1,add2,k1,k2, tm2);
+ 			tm1 += 2; tmp++;
+ 			MOD_ADD32(ic, 1, ODD_RADIX, ic);
+@@ -531,7 +530,7 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 			// the leading pow2-shift arg = trailz(N) - trailz(64) = 0:
+ 			SSE2_RADIX_64_DIF( FALSE, thr_id,
+ 				0,
+-				tmp,t_offsets,
++				(double *)tmp,t_offsets,
+ 				s1p00,	// tmp-storage
+ 				a+jt,io_offsets
+ 			); tmp += 2;
+diff --git a/src/radix56_main_carry_loop.h b/src/radix56_main_carry_loop.h
+index 7e6ba9f..6e395fa 100644
+--- a/src/radix56_main_carry_loop.h
++++ b/src/radix56_main_carry_loop.h
+@@ -434,7 +434,7 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 			k3 = icycle[kc];
+ 			k4 = icycle[lc];
+ 			// Each AVX carry macro call also processes 4 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+ 																		/* vvvvvvvvvvvvvvv [1,2,3]*ODD_RADIX; assumed << l2_sz_vd on input: */
+ 			SSE2_fermat_carry_norm_errcheck_X4_loacc(tm0,tmp,tm1,0x1c0, 0xe0,0x1c0,0x2a0, half_arr,sign_mask,k1,k2,k3,k4,k5,k6,k7, tm2,p01,p02,p03);
+ 			tm0 += 8; tm1++;
+@@ -469,8 +469,8 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 			int k3 = icycle[jc];
+ 			int k4 = jcycle[jc];
+ 			// Each SSE2 carry macro call also processes 2 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+-			tm2 += (-((int)(tm1-cy_r)&0x1)) & p02;	// Base-addr incr by extra p2 on odd-index passes
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tmp-cy_r)>>2]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 += (-((int)((tmp-cy_r)>>1)&0x1)) & p02;	// Base-addr incr by extra p2 on odd-index passes
+ 			SSE2_fermat_carry_norm_errcheck_X2(tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,l,half_arr,sign_mask,add1,add2,k1,k2,k3,k4, tm2,p01);
+ 			tm1 += 4; tmp += 2;
+ 			MOD_ADD32(ic, 2, ODD_RADIX, ic);
+@@ -491,8 +491,8 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 			k1 = icycle[ic];
+ 			k2 = jcycle[ic];
+ 			// Each SSE2 carry macro call also processes 2 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+-			tm2 += p01*((int)(tm1-cy_r)&0x3);	// Added offset cycles among p0,1,2,3
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tmp-cy_r)>>2]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 += p01*((int)(tmp-cy_r)&0x3);	// Added offset cycles among p0,1,2,3
+ 			SSE2_fermat_carry_norm_errcheck(tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,l,half_arr,sign_mask,add1,add2,k1,k2, tm2);
+ 			tm1 += 2; tmp++;
+ 			MOD_ADD32(ic, 1, ODD_RADIX, ic);
+diff --git a/src/radix60_main_carry_loop.h b/src/radix60_main_carry_loop.h
+index 187ec3f..d4ad69b 100644
+--- a/src/radix60_main_carry_loop.h
++++ b/src/radix60_main_carry_loop.h
+@@ -424,7 +424,7 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 			k3 = icycle[kc];
+ 			k4 = icycle[lc];
+ 			// Each AVX carry macro call also processes 4 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+ 																		/* vvvvvvvvvvvvvvv [1,2,3]*ODD_RADIX; assumed << l2_sz_vd on input: */
+ 			SSE2_fermat_carry_norm_errcheck_X4_hiacc(tm0,tmp,l,tm1,0x1e0, 0x1e0,0x3c0,0x5a0, half_arr,sign_mask,k1,k2,k3,k4,k5,k6,k7, tm2,p01,p02,p03);
+ 			tm0 += 8; tm1++; tmp += 8; l -= 0xc0;
+@@ -446,7 +446,7 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 			k3 = icycle[kc];
+ 			k4 = icycle[lc];
+ 			// Each AVX carry macro call also processes 4 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+ 																		/* vvvvvvvvvvvvvvv [1,2,3]*ODD_RADIX; assumed << l2_sz_vd on input: */
+ 			SSE2_fermat_carry_norm_errcheck_X4_loacc(tm0,tmp,tm1,0x1e0, 0x1e0,0x3c0,0x5a0, half_arr,sign_mask,k1,k2,k3,k4,k5,k6,k7, tm2,p01,p02,p03);
+ 			tm0 += 8; tm1++;
+@@ -483,8 +483,8 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 			int k3 = icycle[jc];
+ 			int k4 = jcycle[jc];
+ 			// Each SSE2 carry macro call also processes 2 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+-			tm2 += (-((int)(tm1-cy_r)&0x1)) & p02;	// Base-addr incr by extra p2 on odd-index passes
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tmp-cy_r)>>2]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 += (-((int)((tmp-cy_r)>>1)&0x1)) & p02;	// Base-addr incr by extra p2 on odd-index passes
+ 			SSE2_fermat_carry_norm_errcheck_X2(tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,l,half_arr,sign_mask,add1,add2,k1,k2,k3,k4, tm2,p01);
+ 			tm1 += 4; tmp += 2;
+ 			MOD_ADD32(ic, 2, ODD_RADIX, ic);
+@@ -505,8 +505,8 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 			k1 = icycle[ic];
+ 			k2 = jcycle[ic];
+ 			// Each SSE2 carry macro call also processes 2 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+-			tm2 += p01*((int)(tm1-cy_r)&0x3);	// Added offset cycles among p0,1,2,3
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tmp-cy_r)>>2]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 += p01*((int)(tmp-cy_r)&0x3);	// Added offset cycles among p0,1,2,3
+ 			SSE2_fermat_carry_norm_errcheck(tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,l,half_arr,sign_mask,add1,add2,k1,k2, tm2);
+ 			tm1 += 2; tmp++;
+ 			MOD_ADD32(ic, 1, ODD_RADIX, ic);
+diff --git a/src/radix64_main_carry_loop.h b/src/radix64_main_carry_loop.h
+index ce3e4af..57bea3d 100644
+--- a/src/radix64_main_carry_loop.h
++++ b/src/radix64_main_carry_loop.h
+@@ -464,7 +464,7 @@ normally be getting dispatched to [radix] separate blocks of the A-array, we nee
+ 	  #if (OS_BITS == 32)
+ 		for(l = 0; l < RADIX; l++) {	// RADIX loop passes
+ 			// Each SSE2 carry macro call also processes 1 prefetch of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[l];	// poff[] = p0,4,8,...
++			tm2 = a + j1 + pfetch_dist + poff[l>>2];	// poff[] = p0,4,8,...
+ 			tm2 += (-(l&0x10)) & p02;
+ 			tm2 += (-(l&0x01)) & p01;
+ 			SSE2_fermat_carry_norm_pow2_errcheck   (tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,half_arr,sign_mask,add1,add2, tm2);
+@@ -473,7 +473,7 @@ normally be getting dispatched to [radix] separate blocks of the A-array, we nee
+ 	  #else	// 64-bit SSE2
+ 		for(l = 0; l < RADIX>>1; l++) {	// RADIX/2 loop passes
+ 			// Each SSE2 carry macro call also processes 2 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[l];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 = a + j1 + pfetch_dist + poff[l>>1];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+ 			tm2 += (-(l&0x1)) & p02;	// Base-addr incr by extra p2 on odd-index passes
+ 			SSE2_fermat_carry_norm_pow2_errcheck_X2(tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,half_arr,sign_mask,add1,add2, tm2,p01);
+ 			tm1 += 4; tmp += 2;
+@@ -485,7 +485,7 @@ normally be getting dispatched to [radix] separate blocks of the A-array, we nee
+ 		// Can't use l as loop index here, since it gets used in the Fermat-mod carry macro (as are k1,k2);
+ 		ntmp = 0; addr = cy_r; addi = cy_i;
+ 		for(m = 0; m < RADIX>>2; m++) {
+-			jt = j1 + poff[m]; jp = j2 + poff[m];	// poff[] = p04,08,...,60
++			jt = j1 + poff[m]; jp = j2 + poff[m];	// poff[] = p04,08,...
+ 			fermat_carry_norm_pow2_errcheck(a[jt    ],a[jp    ],*addr,*addi,ntmp,NRTM1,NRT_BITS);	ntmp += NDIVR; ++addr; ++addi;
+ 			fermat_carry_norm_pow2_errcheck(a[jt+p01],a[jp+p01],*addr,*addi,ntmp,NRTM1,NRT_BITS);	ntmp += NDIVR; ++addr; ++addi;
+ 			fermat_carry_norm_pow2_errcheck(a[jt+p02],a[jp+p02],*addr,*addi,ntmp,NRTM1,NRT_BITS);	ntmp += NDIVR; ++addr; ++addi;
+diff --git a/src/radix960_main_carry_loop.h b/src/radix960_main_carry_loop.h
+index cb4cc15..f900a77 100644
+--- a/src/radix960_main_carry_loop.h
++++ b/src/radix960_main_carry_loop.h
+@@ -589,16 +589,22 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 		// Oct 2014: Try getting most of the LOACC speedup with better accuracy by breaking the complex-roots-of-(-1)
+ 		// chaining into 2 or more equal-sized subchains, each starting with 'fresh' (unchained) complex roots:
+ 		#if (LOACC == 0)
++			#warning LOACC = 0
+ 			#define NFOLD (const int)0
+ 		#elif (LOACC == 1)
++			#warning LOACC = 1
+ 			#define NFOLD (const int)1
+ 		#elif (LOACC == 2)
++			#warning LOACC = 2
+ 			#define NFOLD (const int)2
+ 		#elif (LOACC == 3)
++			#warning LOACC = 3
+ 			#define NFOLD (const int)3
+ 		#elif (LOACC == 4)
++			#warning LOACC = 4
+ 			#define NFOLD (const int)4
+ 		#elif (LOACC == 5)
++			#warning LOACC = 5
+ 			#define NFOLD (const int)5
+ 		#else
+ 			#error If LOACC defined for build of radix960_ditN_cy_dif1.c, must be given value 0,1,2,3,4 or 5!
+@@ -650,7 +656,7 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 				k3 = icycle[kc];
+ 				k4 = icycle[lc];
+ 				// Each AVX carry macro call also processes 4 prefetches of main-array data
+-				tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++				tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+ 																			/* vvvvvvvvvvvvvvv [1,2,3]*ODD_RADIX; assumed << l2_sz_vd on input: */
+ 				SSE2_fermat_carry_norm_errcheck_X4_loacc(tm0,tmp,tm1,0x1e00, 0x1e0,0x3c0,0x5a0, half_arr,sign_mask,k1,k2,k3,k4,k5,k6,k7, tm2,p1,p2,p3);
+ 				tm0 += 8; tm1++;
+@@ -681,15 +687,15 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 		ic = 0; jc = 1;
+ 		tm1 = s1p00; tmp = cy_r;	// <*** Again rely on contiguity of cy_r,i here ***
+ 		l = ODD_RADIX;	// Need to stick this #def into an intvar to work around [error: invalid lvalue in asm input for constraint 'm']
+-		while(tm1 < x00) {
++		while((int)(tmp-cy_r) < RADIX) {
+ 			//See "Sep 2014" note in 32-bit SSE2 version of this code below
+ 			k1 = icycle[ic];
+ 			k2 = jcycle[ic];
+ 			k3 = icycle[jc];
+ 			k4 = jcycle[jc];
+ 			// Each SSE2 carry macro call also processes 2 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+-			tm2 += (-((int)(tm1-cy_r)&0x1)) & p2;	// Base-addr incr by extra p2 on odd-index passes
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tmp-cy_r)>>2]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 += (-((int)((tmp-cy_r)>>1)&0x1)) & p2;	// Base-addr incr by extra p2 on odd-index passes
+ 			SSE2_fermat_carry_norm_errcheck_X2(tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,l,half_arr,sign_mask,add1,add2,k1,k2,k3,k4, tm2,p1);
+ 			tm1 += 4; tmp += 2;
+ 			MOD_ADD32(ic, 2, ODD_RADIX, ic);
+@@ -703,15 +709,15 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 		tm1 = s1p00; tmp = cy_r;	// <*** Again rely on contiguity of cy_r,i here ***
+ 		// Need to stick this #def into an intvar to work around [error: invalid lvalue in asm input for constraint 'm']
+ 		l = ODD_RADIX << 4;	// 32-bit version needs preshifted << 4 input value
+-		while(tm1 < x00) {
++		while((int)(tmp-cy_r) < RADIX) {
+ 			//Sep 2014: Even with reduced-register version of the 32-bit Fermat-mod carry macro,
+ 			// GCC runs out of registers on this one, without some playing-around-with-alternate code-sequences ...
+ 			// Pulling the array-refs out of the carry-macro call like so solves the problem:
+ 			k1 = icycle[ic];
+ 			k2 = jcycle[ic];
+ 			// Each SSE2 carry macro call also processes 2 prefetches of main-array data
+-			tm2 = a + j1 + pfetch_dist + poff[(int)(tm1-cy_r)];	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
+-			tm2 += plo[(int)(tm1-cy_r)&0x3];	// Added offset cycles among p0,1,2,3
++			tm2 = (vec_dbl *)(a + j1 + pfetch_dist + poff[(int)(tmp-cy_r)>>2]);	// poff[] = p0,4,8,...; (tm1-cy_r) acts as a linear loop index running from 0,...,RADIX-1 here.
++			tm2 += p1*((int)(tmp-cy_r)&0x3);	// Added offset cycles among p0,1,2,3
+ 			SSE2_fermat_carry_norm_errcheck(tm1,tmp,NRT_BITS,NRTM1,idx_offset,idx_incr,l,half_arr,sign_mask,add1,add2,k1,k2, tm2);
+ 			tm1 += 2; tmp++;
+ 			MOD_ADD32(ic, 1, ODD_RADIX, ic);
+@@ -982,7 +988,7 @@ for(k=1; k <= khi; k++)	/* Do n/(radix(1)*nwt) outer loop executions...	*/
+ 		// the leading pow2-shift arg = trailz(N) - trailz(64) = 0:
+ 			SSE2_RADIX_64_DIF( FALSE, thr_id,
+ 				0,
+-				tmp,t_offsets,
++				(double *)tmp,t_offsets,
+ 				s1p00,	// tmp-storage
+ 				a+jt,dif_o_offsets
+ 			); tmp += 2;
+-- 
+2.12.2
+
diff -Nru mlucas-14.1/debian/patches/0001-Split-big-test-into-smaller-ones.patch mlucas-14.1/debian/patches/0001-Split-big-test-into-smaller-ones.patch
--- mlucas-14.1/debian/patches/0001-Split-big-test-into-smaller-ones.patch	1970-01-01 08:00:00.000000000 +0800
+++ mlucas-14.1/debian/patches/0001-Split-big-test-into-smaller-ones.patch	2017-04-24 16:16:28.000000000 +0800
@@ -0,0 +1,33 @@
+From 35e426b2718af92558df61718f405c69e03bf10d Mon Sep 17 00:00:00 2001
+From: Alex Vong <alexvong1995@gmail.com>
+Date: Mon, 24 Apr 2017 14:09:01 +0800
+Subject: [PATCH] Split big test into smaller ones.
+
+Description: Split big test into smaller ones to avoid exhausting
+ system resources. This fix is inspired by that of
+ https://bugs.debian.org/860664.
+Bug-Debian: https://bugs.debian.org/860662
+Forwarded: yes
+Author: Alex Vong <alexvong1995@gmail.com>
+
+* scripts/self_test.test: Split big test.
+---
+ scripts/self_test.test | 10 ++++++++--
+ 1 file changed, 8 insertions(+), 2 deletions(-)
+
+--- a/scripts/self_test.test
++++ b/scripts/self_test.test
+@@ -29,5 +29,11 @@
+ # Export MLUCAS_PATH so that mlucas.cfg stays in the build directory
+ export MLUCAS_PATH
+ 
+-# Do self-test
+-exec "$MLUCAS_PATH"mlucas -s m
++# List of `medium' exponents
++exponent_ls='20000047 22442237 24878401 27309229 29735137 32156581 34573867 36987271 39397201 44207087 49005071 53792327 58569809 63338459 68098843 72851621 77597293 87068977 96517019 105943723 115351063 124740697 134113933 143472073'
++
++# Run self-test on `medium' exponents
++for exponent in $exponent_ls
++do
++    "$MLUCAS_PATH"mlucas -m "$exponent" -iters 100
++done
diff -Nru mlucas-14.1/debian/patches/series mlucas-14.1/debian/patches/series
--- mlucas-14.1/debian/patches/series	2015-08-28 03:58:09.000000000 +0800
+++ mlucas-14.1/debian/patches/series	2017-04-24 16:16:28.000000000 +0800
@@ -1 +1,3 @@
 0001-Add-copyright-info-of-generated-files.patch
+0001-Split-big-test-into-smaller-ones.patch
+0001-fixes-undefined-behaviour.patch
diff -Nru mlucas-14.1/debian/README.Debian mlucas-14.1/debian/README.Debian
--- mlucas-14.1/debian/README.Debian	2015-08-27 22:53:38.000000000 +0800
+++ mlucas-14.1/debian/README.Debian	2017-04-24 16:16:28.000000000 +0800
@@ -13,6 +13,14 @@
 flag. However, the parser will not reject unsupported arguments. Using
 unsupported arguments for -iters flag may trigger strange behaviour.
 
+On system with limited resources, the self-test for medium exponents
+'mlucas -s m' may fail with 'pthread_create:: Cannot allocate memory'. See
+<https://bugs.debian.org/860662> for details. The current fix is to run
+self-test on each exponent one by one instead. However, this is unsatisfactory
+since it does not prevent the user from running the self-test for medium
+exponents and getting an error.
+
 See BUGS section in mlucas(1) for details.
 
+ -- Alex Vong <alexvong1995@gmail.com>  Thu, 27 Aug 2017 22:04:58 +0800
  -- Alex Vong <alexvong1995@gmail.com>  Thu, 27 Aug 2015 22:04:58 +0800

Feel free to ask for more details.

Cheers,
Alex

unblock mlucas/14.1-2

-- System Information:
Debian Release: 9.0
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64
 (x86_64)

Kernel: Linux 4.9.0-2-amd64 (SMP w/2 CPU cores)
Locale: LANG=zh_TW.UTF-8, LC_CTYPE=zh_TW.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Attachment: signature.asc
Description: PGP signature

--- End Message ---

--- Begin Message ---

To: Alex Vong <alexvong1995@gmail.com>, 861248-done@bugs.debian.org

Subject: Re: Bug#861248: unblock: mlucas/14.1-2

From: Niels Thykier <niels@thykier.net>

Date: Wed, 26 Apr 2017 16:37:00 +0000

Message-id: <c53d6661-2264-404f-059d-0d49a2d0a60e@thykier.net>

In-reply-to: <[🔎] 87y3unmed1.fsf@gmail.com>

References: <[🔎] 87y3unmed1.fsf@gmail.com>
Alex Vong:
> Package: release.debian.org
> Severity: normal
> User: release.debian.org@packages.debian.org
> Usertags: unblock
> 
> Hello release team,
> 
> Please unblock package mlucas.
> 
> This upload should fix the RC bug
> <http://bugs.debian.org/cgi-bin/860662> by splitting big test into
> smaller ones.
> 
> The diff attached below:
> 
> 
> 
> 
> Feel free to ask for more details.
> 
> Cheers,
> Alex
> 
> [...]

Unblocked, thanks.

~Niels
--- End Message ---

Reply to:

References:
- Bug#861248: unblock: mlucas/14.1-2
  - From: Alex Vong <alexvong1995@gmail.com>

Prev by Date: Bug#861247: marked as done (unblock: debian-reference/2.67)
Next by Date: Bug#861257: marked as done (unblock: minissdpd/1.2.20130907-4)
Previous by thread: Bug#861248: unblock: mlucas/14.1-2
Next by thread: Bug#861255: unblock: python-multipletau/0.1.7+ds-1
Index(es):
- Date
- Thread