[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Trying to crack the Firefox crashing issue



Hi guys.

So I this is really a follow up to the "What are the current available browser options for debian-ppc64?" thread where it there was a technical discussion on why Firefox was crashing which ended up being rather anti-climatic. But I wanted to check myself since I'm aware the last few years all Firefox does is crash on load. At the time PPC was last officially supported on Jessie, Firefox was becoming unstable then. It loaded and worked but would easily crash and exit. Now it's much worse.

So unlike most of the PPC people out there I don't have a quad G5 power horse. I do have a rather rare X1000 with a PASemi PA6T. Only dual but 64 bit and does the job. I soon found out running Firefox under GDB needs over 6GB RAM and I only had 4GB with HDD swap space. I rarely need swap on PPC, unlike my laptop. But I had some spare backup RAM and decided to max it out. After wrestling with DDR2 RAM slots I managed to get it working. A 64 bit PowerPC machine with 8GB RAM and Debian 64 installed on SSD. Okay I've broken the 32 bit barrier and now I'm talking. :-D

My results to summarise it are that it is the same crash. Different day, same code. That streqci() function again. This time in firefox_138.0.1. But here's some info I picked up that may help to close in on it. With a running commentary. :-)

damien@ubuntu:~$ gdb firefox.real
GNU gdb (Debian 16.3-1) 16.3

...
Reading symbols from firefox.real...
Reading symbols from /usr/lib/debug/.build-id/fd/6adabdb8b6655f970f65deffcea09f8d7dac41.debug...

(gdb) run
Starting program: /usr/bin/firefox.real
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/powerpc64-linux-gnu/libthread_db.so.1".

... A minute or two filling up 6GB of RAM...

Thread 1 "firefox.real" received signal SIGSEGV, Segmentation fault.
w2c_rlbox_streqci (var_p0=var_p0@entry=262000, var_p1=2016478208,
    instance=<optimized out>) at rlbox.wasm.c:55615
warning: 55615    rlbox.wasm.c: No such file or directory

As you can see different day, same code. Same function but without that  i32_load8_u. I don't like the look of that instance. Why is instance optimized out? The frame is omitted.

Back trace...

(gdb) bt
#0  w2c_rlbox_streqci (var_p0=var_p0@entry=262000, var_p1=2016478208,
    instance=<optimized out>) at rlbox.wasm.c:55615
#1  0x00003fffe8e1e268 in w2c_rlbox_getEncodingIndex (
    instance=<optimized out>, var_p0=<optimized out>) at rlbox.wasm.c:55548
#2  w2c_rlbox_getEncodingIndex (instance=0x3fffda90f000, var_p0=262000)
    at rlbox.wasm.c:55531
#3  w2c_rlbox_MOZ_XmlInitEncodingNS_0 (instance=0x3fffda90f000, var_p0=325428,
    var_p1=325424, var_p2=262000) at rlbox.wasm.c:57164
#4  0x00003fffe8e4ce1c in w2c_rlbox_initializeEncoding (
    instance=instance@entry=0x3fffda90f000, var_p0=var_p0@entry=325280)
    at rlbox.wasm.c:37816

The hit...

(gdb) disas
Dump of assembler code for function w2c_rlbox_streqci:
   0x00003fffe8e1e150 <+0>:    ld      r3,0(r3)
   0x00003fffe8e1e154 <+4>:    subf    r4,r5,r4
   0x00003fffe8e1e158 <+8>:    nop
   0x00003fffe8e1e15c <+12>:    nop
=> 0x00003fffe8e1e160 <+16>:    lbzx    r9,r3,r5
   0x00003fffe8e1e164 <+20>:    add     r10,r4,r5
   0x00003fffe8e1e168 <+24>:    clrlwi  r9,r9,24
   0x00003fffe8e1e16c <+28>:    clrldi  r10,r10,32
   0x00003fffe8e1e170 <+32>:    lbzx    r10,r3,r10

Why is there nop? Does it mean ori? PPC doesn't have nop. Why doesn't gdb list the machine code as standard? Supposed to be a debugger. This code looks sus.

Registers...
(gdb) info r
r0             0x3fffe8e4ce1c      70368356519452
r1             0x3fffffffbe90      70368744160912
r2             0x3ffff413c500      70368544146688
r3             0x3ffb00000000      70347269341184
r4             0xffffffff87d2fb70  18446744071693335408
r5             0x78310400          2016478208

Ok so it doesn't like r9 = [r3 + r5]. What's wrong with 3FFB78310400? Apart from r5 being a large 32 bit integer.

I had apt sourced the source but gdb couldn't see it so needed to so some digging...

damien@ubuntu:~/Applications/firefox-debug/firefox-138.0.1$ grep -ir "streqci" .
./parser/expat/expat/lib/xmltok.c:streqci(const char *s1, const char *s2) {
./parser/expat/expat/lib/xmltok.c:      /* The following line will never get executed.  streqci() is
./parser/expat/expat/lib/xmltok.c:    if (streqci(name, encodingNames[i]))
./parser/expat/expat/lib/xmltok_ns.c:  if (streqci(buf, KW_UTF_16) && enc->minBytesPerChar == 2)

The source:
static int FASTCALL
streqci(const char *s1, const char *s2) {
  for (;;) {
    char c1 = *s1++;
    char c2 = *s2++;
    if (ASCII_a <= c1 && c1 <= ASCII_z)
      c1 += ASCII_A - ASCII_a;
    if (ASCII_a <= c2 && c2 <= ASCII_z)
      /* The following line will never get executed.  streqci() is
       * only called from two places, both of which guarantee to put
       * upper-case strings into s2.
       */
      c2 += ASCII_A - ASCII_a; /* LCOV_EXCL_LINE */
    if (c1 != c2)
      return 0;
    if (! c1)
      break;
  }
  return 1;
}

This code appears to be poor quality. It doesn't validate the input strings nor check for null bytes. Not to mention that icky for ever. That if test is in a strange order making some sort of coding palindrome. Funny. :-)

Perhaps for an internal API, not checking input when input must be given is acceptable, but these are the reasons C lib str*() functions are criticised now days. This streqci() is uncommon in my search and particular to XML parsing. Ok, so what is going wrong with it? Given it's embedded into this rlbox.wasm.c where is it generated from? The build itself? I don't see that exact file in source.

From what I can tell it wants to use r5 as an index with lbzx but instead does something funky with r5 instead of zeroing it. Before doing nothing twice. It would have been better off using lbzu! So what kind of contraption caused the C compiler to generate asm code like that? The C code looks straight forward enough for a C compiler to understand but the binary code is corrupted. Is this is a result of the build process wrecking it? Or does PPC GCC have some rare bug causing a code side effect of broken code? I know this is old news by now but I just don't know how it ended up generating broken code that is only broken on PPC. :-?



Reply to: