Trying to crack the Firefox crashing issue
Hi guys.
So I this is really a follow up to the "What are the current available
browser options for debian-ppc64?" thread where it there was a technical
discussion on why Firefox was crashing which ended up being rather
anti-climatic. But I wanted to check myself since I'm aware the last few
years all Firefox does is crash on load. At the time PPC was last
officially supported on Jessie, Firefox was becoming unstable then. It
loaded and worked but would easily crash and exit. Now it's much worse.
So unlike most of the PPC people out there I don't have a quad G5 power
horse. I do have a rather rare X1000 with a PASemi PA6T. Only dual but
64 bit and does the job. I soon found out running Firefox under GDB
needs over 6GB RAM and I only had 4GB with HDD swap space. I rarely need
swap on PPC, unlike my laptop. But I had some spare backup RAM and
decided to max it out. After wrestling with DDR2 RAM slots I managed to
get it working. A 64 bit PowerPC machine with 8GB RAM and Debian 64
installed on SSD. Okay I've broken the 32 bit barrier and now I'm
talking. :-D
My results to summarise it are that it is the same crash. Different day,
same code. That streqci() function again. This time in firefox_138.0.1.
But here's some info I picked up that may help to close in on it. With a
running commentary. :-)
damien@ubuntu:~$ gdb firefox.real
GNU gdb (Debian 16.3-1) 16.3
...
Reading symbols from firefox.real...
Reading symbols from
/usr/lib/debug/.build-id/fd/6adabdb8b6655f970f65deffcea09f8d7dac41.debug...
(gdb) run
Starting program: /usr/bin/firefox.real
[Thread debugging using libthread_db enabled]
Using host libthread_db library
"/lib/powerpc64-linux-gnu/libthread_db.so.1".
... A minute or two filling up 6GB of RAM...
Thread 1 "firefox.real" received signal SIGSEGV, Segmentation fault.
w2c_rlbox_streqci (var_p0=var_p0@entry=262000, var_p1=2016478208,
instance=<optimized out>) at rlbox.wasm.c:55615
warning: 55615 rlbox.wasm.c: No such file or directory
As you can see different day, same code. Same function but without that
i32_load8_u. I don't like the look of that instance. Why is instance
optimized out? The frame is omitted.
Back trace...
(gdb) bt
#0 w2c_rlbox_streqci (var_p0=var_p0@entry=262000, var_p1=2016478208,
instance=<optimized out>) at rlbox.wasm.c:55615
#1 0x00003fffe8e1e268 in w2c_rlbox_getEncodingIndex (
instance=<optimized out>, var_p0=<optimized out>) at rlbox.wasm.c:55548
#2 w2c_rlbox_getEncodingIndex (instance=0x3fffda90f000, var_p0=262000)
at rlbox.wasm.c:55531
#3 w2c_rlbox_MOZ_XmlInitEncodingNS_0 (instance=0x3fffda90f000,
var_p0=325428,
var_p1=325424, var_p2=262000) at rlbox.wasm.c:57164
#4 0x00003fffe8e4ce1c in w2c_rlbox_initializeEncoding (
instance=instance@entry=0x3fffda90f000, var_p0=var_p0@entry=325280)
at rlbox.wasm.c:37816
The hit...
(gdb) disas
Dump of assembler code for function w2c_rlbox_streqci:
0x00003fffe8e1e150 <+0>: ld r3,0(r3)
0x00003fffe8e1e154 <+4>: subf r4,r5,r4
0x00003fffe8e1e158 <+8>: nop
0x00003fffe8e1e15c <+12>: nop
=> 0x00003fffe8e1e160 <+16>: lbzx r9,r3,r5
0x00003fffe8e1e164 <+20>: add r10,r4,r5
0x00003fffe8e1e168 <+24>: clrlwi r9,r9,24
0x00003fffe8e1e16c <+28>: clrldi r10,r10,32
0x00003fffe8e1e170 <+32>: lbzx r10,r3,r10
Why is there nop? Does it mean ori? PPC doesn't have nop. Why doesn't
gdb list the machine code as standard? Supposed to be a debugger. This
code looks sus.
Registers...
(gdb) info r
r0 0x3fffe8e4ce1c 70368356519452
r1 0x3fffffffbe90 70368744160912
r2 0x3ffff413c500 70368544146688
r3 0x3ffb00000000 70347269341184
r4 0xffffffff87d2fb70 18446744071693335408
r5 0x78310400 2016478208
Ok so it doesn't like r9 = [r3 + r5]. What's wrong with 3FFB78310400?
Apart from r5 being a large 32 bit integer.
I had apt sourced the source but gdb couldn't see it so needed to so
some digging...
damien@ubuntu:~/Applications/firefox-debug/firefox-138.0.1$ grep -ir
"streqci" .
./parser/expat/expat/lib/xmltok.c:streqci(const char *s1, const char *s2) {
./parser/expat/expat/lib/xmltok.c: /* The following line will never
get executed. streqci() is
./parser/expat/expat/lib/xmltok.c: if (streqci(name, encodingNames[i]))
./parser/expat/expat/lib/xmltok_ns.c: if (streqci(buf, KW_UTF_16) &&
enc->minBytesPerChar == 2)
The source:
static int FASTCALL
streqci(const char *s1, const char *s2) {
for (;;) {
char c1 = *s1++;
char c2 = *s2++;
if (ASCII_a <= c1 && c1 <= ASCII_z)
c1 += ASCII_A - ASCII_a;
if (ASCII_a <= c2 && c2 <= ASCII_z)
/* The following line will never get executed. streqci() is
* only called from two places, both of which guarantee to put
* upper-case strings into s2.
*/
c2 += ASCII_A - ASCII_a; /* LCOV_EXCL_LINE */
if (c1 != c2)
return 0;
if (! c1)
break;
}
return 1;
}
This code appears to be poor quality. It doesn't validate the input
strings nor check for null bytes. Not to mention that icky for ever.
That if test is in a strange order making some sort of coding
palindrome. Funny. :-)
Perhaps for an internal API, not checking input when input must be given
is acceptable, but these are the reasons C lib str*() functions are
criticised now days. This streqci() is uncommon in my search and
particular to XML parsing. Ok, so what is going wrong with it? Given
it's embedded into this rlbox.wasm.c where is it generated from? The
build itself? I don't see that exact file in source.
From what I can tell it wants to use r5 as an index with lbzx but
instead does something funky with r5 instead of zeroing it. Before doing
nothing twice. It would have been better off using lbzu! So what kind of
contraption caused the C compiler to generate asm code like that? The C
code looks straight forward enough for a C compiler to understand but
the binary code is corrupted. Is this is a result of the build process
wrecking it? Or does PPC GCC have some rare bug causing a code side
effect of broken code? I know this is old news by now but I just don't
know how it ended up generating broken code that is only broken on PPC. :-?
Reply to: