[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#621420: Misbuilds busybox on ARM



Package: gcc-4.5
Version: 4.5.2-8
Severity: normal

        Hi

 See Debian #621137; gcc-4.5 4.5.2-8 misbuilds busybox: r5 is clobbered
 in tryexec() as gcc believes it doesn't return, but it does.  This
 doesn't happen with Linaro GCC 4.5 or GCC 4.6, but it does happen with
 plain FSF 4.5.2.

 I've asked on the Linaro Toolchain ML which patch fixes this:
http://lists.linaro.org/pipermail/linaro-toolchain/2011-April/001047.html

 But I didn't open an upstream ticket as there might be one already.

 See IRC log below for details.

00:52 < lool> michaelh1: if I remove references to applet_no from the function 
              and call sites (it's unused anyway) the issue goes away
00:52 < lool> michaelh1: it's flagged __unused_param__
00:53 < lool> michaelh1: So I have ash.o and am looking at a disassembled dump, 
              and I can't figure what to look for
00:53 < michaelh1> lool: do you have a preprocessed version of it?  What 
                   compiler and build options does the fault occur at?
00:53 < michaelh1> lool: I'm suspicious that you're going from a all-register 
                   version to a also-something-on-the-stack by adding the extra 
                   parameter.
00:54 < lool> michaelh1: Right now I'm doing this with Debian's gcc-4.5
00:54 < lool> no Linaro patchset
00:55 < lool> I was using it earlier, but the busybox versions I was using were 
              not xbuild friendly, or did not reproduce
00:55 < lool> Now that I know that this Debian patch triggers it, I might be 
              able to reproduce with Linaro toolchain
00:55 < michaelh1> lool: yip.  Can you do a -save-temps run and send me ash.i 
                   and ash.s?
00:56 < lool> michaelh1: BTW I can feel how it would have been useful to have 
              "fsf" toolchains in Ubuntu or Debian and Linaro toolchains in 
              Debian  :-)
00:56 < michaelh1> lool: yeah, I want to have a plain linaro PPA in the future. 
                   Add a plain fsf as well using the same infrastructure.  
                   Helps in tracking down what introduced a problem...
00:57 -!- NishanthMenon [~nmenon@nat/ti/x-iiyvawwlphubdqsc] has quit [Quit: 
          Ex-Chat]
00:57 < lool> michaelh1: I'll look at saving the temps; objdump -D isn't useful 
              I guess?
00:58 < lool> http://paste.ubuntu.com/590485/ has one
00:58 < lool> it's awfully big
00:58 < michaelh1> lool: it's easiest to compare outputs from different 
                   compilers by running over the .i file.
00:58  * michaelh1 looks
00:59 < michaelh1> lool: is this the broken one?
01:00 < lool> michaelh1: http://people.linaro.org/~lool/ash.s
01:00 < lool> michaelh1: That one is broken yes
01:01 < lool> michaelh1: http://people.linaro.org/~lool/ash.i
01:03 < michaelh1> lool: what options did you pass to GCC? -Os? -O2?
01:05 < lool> michaelh1: Hmm oddly it seems to be -Os
01:05 < lool> gcc -save-temps -Wp,-MD,shell/.ash.o.d   -std=gnu99 -Iinclude 
              -Ilibbb  -include include/autoconf.h -D_GNU_SOURCE -DNDEBUG 
              -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 
              -D"BB_VER=KBUILD_STR(1.18.4)" -D"BB_BT=AUTOCONF_TIMESTAMP"  -Wall 
              -Wshadow -Wwrite-strings -Wundef -Wstrict-prototypes -Wunused 
              -Wunused-parameter -Wunused-function -Wunused-value 
              -Wmissing-prototypes -Wmissing-declarations 
              -Wdeclaration-after-statement ...
01:05 < lool> ... -Wold-style-definition -fno-builtin-strlen -finline-limit=0 
              -fomit-frame-pointer -ffunction-sections -fdata-sections 
              -fno-guess-branch-probability -funsigned-char -static-libgcc 
              -falign-functions=1 -falign-jumps=1 -falign-labels=1 
              -falign-loops=1 -g -Os     -D"KBUILD_STR(s)=#s" 
              -D"KBUILD_BASENAME=KBUILD_STR(ash)"  
              -D"KBUILD_MODNAME=KBUILD_STR(ash)" -c -o shell/ash.o shell/ash.c
01:05 < lool> But it should have been -O0  :-(
01:05 < lool> michaelh1: I tried cross-building with Ubuntu's cross and 
              Debian's patch applied, but I can't reproduce with that toolchain
01:05 -!- robher [~robher@173.226.190.126] has quit [Ping timeout: 246 seconds]
01:05 < lool> it might also be other differences between my tree and the Debian 
              one
01:06 < michaelh1> lool: but you can reproduce the problem?
01:06 -!- marciom [~marciom@209.119.62.66] has joined #linaro
01:07 < lool> michaelh1: I can in a Debian gcc-4.5 context
01:07 < michaelh1> lool: and is this for armv4t or armv7-a
01:07 < lool> michaelh1: Debian is armv4t, when cross-building I passed 
              CC="arm-linux-gnueabi-gcc -marm -march=armv4t -mtune=arm9tdmi" 
              and verified that this was used
01:07 < lool> but the cross-build didn't expose the issue
01:08 < lool> I'm trying another cross-build with the exact Debian tree rather 
              than busybox git + Debian patch
01:08 < michaelh1> lool: OK.  so tryexec.clone.0 seems to be tryexec with the 
                   first argument removed.
01:09 -!- arun_ [~arun@unaffiliated/sindian] has joined #linaro
01:09 < lool> michaelh1: BTW "successes" don't mean anything above as the issue 
              is slightly random
01:10 < lool> michaelh1: probably memory initialization or stack state
01:11 < lool> So I tried the exact Debian tree, and couldn't reproduce with a 
              cross-build using Ubuntu's cross toolchain
01:11 -!- rcn-ee [~voodoo@thief-pool8-147.mncable.net] has quit [Remote host 
          closed the connection]
01:12 < lool> michaelh1: Ah I failed to mention that we had a similar issue in 
              Ubuntu at some point, and it went away by changing build options  
              :-/
01:12 < lool> michaelh1: 
              https://bugs.launchpad.net/ubuntu/+source/klibc/+bug/683683
01:12 < lool> actually maybe I need to pass -marm
01:13 < lool> ah no I did
01:13 -!- rcn-ee [~voodoo@thief-pool8-147.mncable.net] has joined #linaro
01:14 < michaelh1> lool: OK.  The assembly up until just after xmalloc is 
                   identical with Linaro GCC.  Let's see if the code following 
                   is correct...
01:19 -!- rcn-ee [~voodoo@thief-pool8-147.mncable.net] has quit [Ping timeout: 
          276 seconds]
01:19 -!- mpoirier [~quassel@S0106002369de4dac.cg.shawcable.net] has quit [Ping 
          timeout: 260 seconds]
01:23 < lool> michaelh1: BTW I tride running in valgrind, but got a fury of 
              warnings; also, valgrind isnt happy when the underlying program 
              calls exec(0
01:23 < lool> exec()
01:24 < michaelh1> lool: on first pass, the code looks OK.  A strange thing is 
                   that Debian GCC thinks the function never returns...
01:24 < lool> michaelh1: I think it's because the parent function doesn't return
01:24 < lool> michaelh1: and it's static
01:24 < lool> michaelh1: I wondered the same
01:25 < michaelh1> lool: Linaro GCC thinks that it does return...
01:25 -!- jadamcze [~jadamcze@ppp118-208-199-129.lns20.hba1.internode.on.net] 
          has quit [Read error: Operation timed out]
01:25 < lool> michaelh1: that is a good lead
01:26 < lool> michaelh1: gcc-4.5 os 4.5.2 in Debian
01:26 < michaelh1> lool: yip.  I see the same as your ash.s when compiling with 
                   plain 4.5.2
01:27 -!- jadamcze [~jadamcze@ppp118-208-206-25.lns20.hba1.internode.on.net] 
          has joined #linaro
01:27 < michaelh1> lool: so: Linaro GCC 4.5 says the function returns, FSF 4.6 
                   says it returns, FSF 4.5.2 says it doesn't
01:28 < lool> michaelh1: Ah, I wanted to try with gcc-4.6, but didn't have easy 
              access to it
01:28 < lool> It would have been a good test
01:29 < lool> michaelh1: Is this something you can spot easily in 
              Changelog.linaro?
01:29 -!- marciom [~marciom@209.119.62.66] has quit [Quit: This computer has 
          gone to sleep]
01:29 < lool> michaelh1: I'm some days behind on Debian gcc-4.5, I see Matthias 
              uploaded a bunch of things, trying this now
01:29 < michaelh1> lool: I remember there being problems with these .clone. 
                   functions in Qt.
01:30 -!- rcn-ee [~voodoo@thief-pool8-147.mncable.net] has joined #linaro
01:32 < michaelh1> lool: here's another hint: with applet_no removed, gcc 4.5.2 
                   now thinks that the function returns.
01:32 < lool> michaelh1: haha awesome
01:33 < lool> michaelh1: So probably a combination with __unused__ param
01:35 < lool> Hmm I can't put __attribute__ ((__noreturn__)) in my tentative 
              test case
01:35 < lool> michaelh1: I'm trying to convert this issue in a test case
01:35 < lool> but gcc doesn't like my //static void foo(void) __attribute__ 
              ((__noreturn__))
01:36 < lool> ah I need to declare it standalone
01:39 < lool> michaelh1: http://paste.ubuntu.com/590492/ doesn't give the same 
              detailed output in the .s
01:39 < michaelh1> lool: so tryexec is at least clobbering the callers r5
01:40 < lool> michaelh1: Hmm which gcc wouldn't see as a problem for a noreturn 
              function I guess
01:42 -!- rcn-ee [~voodoo@thief-pool8-147.mncable.net] has quit [Remote host 
          closed the connection]
01:45 < lool> michaelh1: Still an issue with latest Debian gcc-4.5
01:45 < michaelh1> lool: hmm.  applet_no seems to be uninitialised in 
                   shellexec()
01:46 < lool> Yup; the initial code was setting it from a function call
01:46 < lool> michaelh1: trying a build setting that to 0
01:47 < michaelh1> lool: removing the 'static' from tryexec turns it back into 
                   a normal call, drops the clone, and now 4.5.2 thinks the 
                   function returns...
01:48 < lool> michaelh1: BTW these are my first builds on pandas, and gosh is 
              that nice
01:49 < michaelh1> lool: 2.3 x faster on a GCC build...
01:49 < lool> plus I've setup ccache and am building the package with -j2, it's 
              quite impressive
01:50 < lool> michaelh1: I still see the issue after initializing applet_no to 0
01:51 -!- jadamcze [~jadamcze@ppp118-208-206-25.lns20.hba1.internode.on.net] 
          has quit [Ping timeout: 246 seconds]
01:51 < michaelh1> lool: here's a minimal version of tryexec: 
                   http://pastebin.ubuntu.com/590498/
01:52 < michaelh1> lool: delete the variable and GCC changes it's mind from 
                   'doesn't return' to 'does return'
01:53 -!- jadamcze [~jadamcze@ppp118-208-249-88.lns20.hba2.internode.on.net] 
          has joined #linaro
01:53 < lool> michaelh1: Now I feel a bit bad because it's not affecting 
              gcc-linaro, but it does affect FSF 4.5 and Debian
01:54 < lool> michaelh1: Do you think there is a chance that it's just hidden 
              upstream and could bite use again in the future/
01:54 < lool> ?
01:54 < michaelh1> lool: all bugs are worth looking into as toolchain bugs are 
                   rare...
01:54 < michaelh1> lool: I'm pretty sure we've seen and fixed a similar bug but 
                   I can't track it down.  Send an email to linaro-toolchain?
01:55 < lool> michaelh1: How do you trigger the output of "return" in the gcc 
              output?
01:55 < lool> arm-linux-gnueabi-gcc  -save-temps -std=gnu99 -D_GNU_SOURCE -o 
              bar.o -c bar.c doesn't show it with your tryexec snippet
01:55 < michaelh1> lool: oh, and GCC 4.5.2 gets it wrong at -O2 as well...
01:56 < michaelh1> lool: I don't understand.
01:56 < lool> michaelh1: I confirm that removing the static works around the 
              issue as well, busybox works again
01:56 < michaelh1> lool: I shrunk tryexec in ash.i down to that snippet and 
                   kept the rest of the file.
01:56 < lool> michaelh1: I am trying to get the same interesting "no return" 
              output
01:57 < lool> Oh you're building the whole .i, I was trying to come down to a 
              .c file + build opts
01:57 < lool> but I guess it's just simpler to keep using the .i file
01:57 < michaelh1> I'd be happy with a minimal version of the .i file 
                   containing, say, tryexec() and shellexec().  Both are 
                   already small...

  Thanks,
-- 
Loïc Minier



Reply to: