Re: Trying to crack the Firefox crashing issue
On May 9, 2025 2:02:49 PM GMT+07:00, Damien Stewart <hypexed@yahoo.com.au> wrote:
>Hi guys.
>
>So I this is really a follow up to the "What are the current available browser options for debian-ppc64?" thread where it there was a technical discussion on why Firefox was crashing which ended up being rather anti-climatic. But I wanted to check myself since I'm aware the last few years all Firefox does is crash on load. At the time PPC was last officially supported on Jessie, Firefox was becoming unstable then. It loaded and worked but would easily crash and exit. Now it's much worse.
>
>So unlike most of the PPC people out there I don't have a quad G5 power horse. I do have a rather rare X1000 with a PASemi PA6T. Only dual but 64 bit and does the job. I soon found out running Firefox under GDB needs over 6GB RAM and I only had 4GB with HDD swap space. I rarely need swap on PPC, unlike my laptop. But I had some spare backup RAM and decided to max it out. After wrestling with DDR2 RAM slots I managed to get it working. A 64 bit PowerPC machine with 8GB RAM and Debian 64 installed on SSD. Okay I've broken the 32 bit barrier and now I'm talking. :-D
>
>My results to summarise it are that it is the same crash. Different day, same code. That streqci() function again. This time in firefox_138.0.1. But here's some info I picked up that may help to close in on it. With a running commentary. :-)
>
>damien@ubuntu:~$ gdb firefox.real
>GNU gdb (Debian 16.3-1) 16.3
>
>...
>Reading symbols from firefox.real...
>Reading symbols from /usr/lib/debug/.build-id/fd/6adabdb8b6655f970f65deffcea09f8d7dac41.debug...
>
>(gdb) run
>Starting program: /usr/bin/firefox.real
>[Thread debugging using libthread_db enabled]
>Using host libthread_db library "/lib/powerpc64-linux-gnu/libthread_db.so.1".
>
>... A minute or two filling up 6GB of RAM...
>
>Thread 1 "firefox.real" received signal SIGSEGV, Segmentation fault.
>w2c_rlbox_streqci (var_p0=var_p0@entry=262000, var_p1=2016478208,
> instance=<optimized out>) at rlbox.wasm.c:55615
>warning: 55615 rlbox.wasm.c: No such file or directory
>
>As you can see different day, same code. Same function but without that i32_load8_u. I don't like the look of that instance. Why is instance optimized out? The frame is omitted.
>
>Back trace...
>
>(gdb) bt
>#0 w2c_rlbox_streqci (var_p0=var_p0@entry=262000, var_p1=2016478208,
> instance=<optimized out>) at rlbox.wasm.c:55615
>#1 0x00003fffe8e1e268 in w2c_rlbox_getEncodingIndex (
> instance=<optimized out>, var_p0=<optimized out>) at rlbox.wasm.c:55548
>#2 w2c_rlbox_getEncodingIndex (instance=0x3fffda90f000, var_p0=262000)
> at rlbox.wasm.c:55531
>#3 w2c_rlbox_MOZ_XmlInitEncodingNS_0 (instance=0x3fffda90f000, var_p0=325428,
> var_p1=325424, var_p2=262000) at rlbox.wasm.c:57164
>#4 0x00003fffe8e4ce1c in w2c_rlbox_initializeEncoding (
> instance=instance@entry=0x3fffda90f000, var_p0=var_p0@entry=325280)
> at rlbox.wasm.c:37816
>
>The hit...
>
>(gdb) disas
>Dump of assembler code for function w2c_rlbox_streqci:
> 0x00003fffe8e1e150 <+0>: ld r3,0(r3)
> 0x00003fffe8e1e154 <+4>: subf r4,r5,r4
> 0x00003fffe8e1e158 <+8>: nop
> 0x00003fffe8e1e15c <+12>: nop
>=> 0x00003fffe8e1e160 <+16>: lbzx r9,r3,r5
> 0x00003fffe8e1e164 <+20>: add r10,r4,r5
> 0x00003fffe8e1e168 <+24>: clrlwi r9,r9,24
> 0x00003fffe8e1e16c <+28>: clrldi r10,r10,32
> 0x00003fffe8e1e170 <+32>: lbzx r10,r3,r10
>
>Why is there nop? Does it mean ori? PPC doesn't have nop. Why doesn't gdb list the machine code as standard? Supposed to be a debugger. This code looks sus.
>
>Registers...
>(gdb) info r
>r0 0x3fffe8e4ce1c 70368356519452
>r1 0x3fffffffbe90 70368744160912
>r2 0x3ffff413c500 70368544146688
>r3 0x3ffb00000000 70347269341184
>r4 0xffffffff87d2fb70 18446744071693335408
>r5 0x78310400 2016478208
>
>Ok so it doesn't like r9 = [r3 + r5]. What's wrong with 3FFB78310400? Apart from r5 being a large 32 bit integer.
>
>I had apt sourced the source but gdb couldn't see it so needed to so some digging...
>
>damien@ubuntu:~/Applications/firefox-debug/firefox-138.0.1$ grep -ir "streqci" .
>./parser/expat/expat/lib/xmltok.c:streqci(const char *s1, const char *s2) {
>./parser/expat/expat/lib/xmltok.c: /* The following line will never get executed. streqci() is
>./parser/expat/expat/lib/xmltok.c: if (streqci(name, encodingNames[i]))
>./parser/expat/expat/lib/xmltok_ns.c: if (streqci(buf, KW_UTF_16) && enc->minBytesPerChar == 2)
>
>The source:
>static int FASTCALL
>streqci(const char *s1, const char *s2) {
> for (;;) {
> char c1 = *s1++;
> char c2 = *s2++;
> if (ASCII_a <= c1 && c1 <= ASCII_z)
> c1 += ASCII_A - ASCII_a;
> if (ASCII_a <= c2 && c2 <= ASCII_z)
> /* The following line will never get executed. streqci() is
> * only called from two places, both of which guarantee to put
> * upper-case strings into s2.
> */
> c2 += ASCII_A - ASCII_a; /* LCOV_EXCL_LINE */
> if (c1 != c2)
> return 0;
> if (! c1)
> break;
> }
> return 1;
>}
>
>This code appears to be poor quality. It doesn't validate the input strings nor check for null bytes. Not to mention that icky for ever. That if test is in a strange order making some sort of coding palindrome. Funny. :-)
>
>Perhaps for an internal API, not checking input when input must be given is acceptable, but these are the reasons C lib str*() functions are criticised now days. This streqci() is uncommon in my search and particular to XML parsing. Ok, so what is going wrong with it? Given it's embedded into this rlbox.wasm.c where is it generated from? The build itself? I don't see that exact file in source.
>
>From what I can tell it wants to use r5 as an index with lbzx but instead does something funky with r5 instead of zeroing it. Before doing nothing twice. It would have been better off using lbzu! So what kind of contraption caused the C compiler to generate asm code like that? The C code looks straight forward enough for a C compiler to understand but the binary code is corrupted. Is this is a result of the build process wrecking it? Or does PPC GCC have some rare bug causing a code side effect of broken code? I know this is old news by now but I just don't know how it ended up generating broken code that is only broken on PPC. :-?
>
>
I don't think you are quite on the right track.
w2c_rlbox_streqci has 3 arguments, not 2 like the source you quote, so that will make it real hard to match those up.
The nops are almost certainly there to align the loop start with the icache line size (making it start at +16).
The strange r4 r5 sub is probably to have a single loop increment variable, but I expect that will be really hard to understand without finding the exact matching code and what that 3rd argument is about.
That is besides that the bug could be very, very far away from the crash location, so not sure you are anywhere close to the issue (but I don't have any better suggestion).
Reply to: