[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Help requested for htslib build failure on ppc64el





În lun., 31 dec. 2018 la 06:42, Gustavo Romero <gromero@linux.vnet.ibm.com> a scris:
Hi Michael, Steffen,

Hey Gustavo, 

 
On 12/30/2018 05:35 PM, Steffen Möller wrote:
> I admit this is a bit over my head. It may be worthwhile to peek a boo at the latest gcc snapshots https://packages.debian.org/de/sid/ppc64/gcc-snapshot/download that may have a fix for that issue already.

I don't think it's a compiler issue... I can't explain it yet but it's probably
a float / double truncation issue exposed by some gcc-8 optimization. But trying
the snapshots can add more information to explain what's going on...

> As a workaround to quickly proceed with what you are really after, and following the investigation by Gustavo, you may decide to have -O3 changed to -O2 if the arch is ppc64 and the compiler version is that faulty one.

Sorry, I meant -O0 instead of -O2, i.e. disabling any optimization, so it can't
be used as a workaround since it will destroy performance, specially for RISC.

That said, I dug a bit further and it looks like that sam_parse1() is actually
generating a different value for the floats in question, so when they are
loaded back for the comparison they are already screwed up, e.g.:

-- O3 --
0xc0490fcf
-3.14158988

-- O0 --
0xc0490fd0
-3.14159012 (expected)

because strtod() is used in float_to_le() and so also in u32_to_le(), which are
inlined and float_to_le() takes a float and not a double as the first argument
the use strtof() instead of strtod() in there looks the best, avoiding a
truncation from double to float, which might cause precision issues, specially
between different archs (like casts) plus optimization. So the following
change fixed the issue for me.

Michael, could you confirm that the following change fixes the issue at your
side  please?

diff --git a/sam.c b/sam.c
index aa94776..23233a0 100644
--- a/sam.c
+++ b/sam.c
@@ -1408,7 +1408,7 @@ int sam_parse1(kstring_t *s, bam_hdr_t *h, bam1_t *b)
              else if (type == 'S') while (q + 1 < p) { u16_to_le(strtoul(q + 1, &q, 0), (uint8_t *) str.s + str.l); str.l += 2; _skip_to_comma(q, p); }
              else if (type == 'i') while (q + 1 < p) { i32_to_le(strtol(q + 1, &q, 0), (uint8_t *) str.s + str.l); str.l += 4; _skip_to_comma(q, p); }
              else if (type == 'I') while (q + 1 < p) { u32_to_le(strtoul(q + 1, &q, 0), (uint8_t *) str.s + str.l); str.l += 4; _skip_to_comma(q, p); }
-            else if (type == 'f') while (q + 1 < p) { float_to_le(strtod(q + 1, &q), (uint8_t *) str.s + str.l); str.l += 4; _skip_to_comma(q, p); }
+            else if (type == 'f') while (q + 1 < p) { float_to_le(strtof(q + 1, &q), (uint8_t *) str.s + str.l); str.l += 4; _skip_to_comma(q, p); }
              else _parse_err_param(1, "unrecognized type B:%c", type);

  #undef _skip_to_comma

Applying this patch and compiling under a qemu-based sid-ppc64el builder gets us to only a single test failure (yay!)

===
test_vcf_various:                                                                                                                                      
        /build/htslib-1.9/htsfile -c /build/htslib-1.9/test/formatcols.vcf                                                                             
                                                                                                                                                       
        The outputs differ:                                                                                                                            
                /build/htslib-1.9/test/formatcols.vcf                                                                                                  
                /build/htslib-1.9/test/formatcols.vcf.new                                                                                              
.. failed ...   
=== 

In the meantime, to unclog a chain of packages that have been held back from migrating to testing, I've uploaded a version of the package that uses -O0 for ppc64el only; which I agree is not ideal.

If you think this is a qemu-only test failure then I can upload a build to experimental so that real hardware is used (I don't have porter box access)

 

Cheers,
Gustavo

> Cheers,
>
> Steffen
>
> On 27.12.18 15:41, Michael Crusoe wrote:
>> https://buildd.debian.org/status/fetch.php?pkg=htslib&arch=ppc64el&ver=1.9-7&stamp=1545236716&raw=0
>>
>> Can I get some assistance here? Rebuilding using Qemu and the earlier source packages produces the same error, so maybe this is a regression in the compiler?
>>
>> --
>> Michael R. Crusoe
>> Co-founder & Lead, Common Workflow Language project <http://www.commonwl.org/>
>> Direktorius, VšĮ "Darbo eigos", Vilnius, Lithuania
>> https://orcid.org/0000-0002-2961-9670 <https://impactstory.org/u/0000-0002-2961-9670>
>> mrc@commonwl.org <mailto:mrc@commonwl.org>
>> +1 480 627 9108 / +370 653 11125
>



--
Michael R. Crusoe
Co-founder & Lead, Common Workflow Language project
Direktorius, VšĮ "Darbo eigos", Vilnius, Lithuania

Reply to: