[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Proposing a debugging method for this bug



Jérôme Marant wrote:
Le lundi 06 novembre 2006 14:51, Attilio Fiandrotti a écrit :


How does it need to be fixed? Won't using different signal fix the issue?


Using different signals should fix up this specific crash, but basing on this BR [1] by colin watson cdebconf's signal handling mechanism is not very robus and needs to be fixed. If you have some time, could you please boot textual and send to cdebconf SIGUSR1 and SIGUSR2 signals, to see if it crashes?


It crashes on SIGUSR2. It is OK with SIGUSR1.

looking at debconf.c from cdebconf package

main() {
	...
	signal(SIGINT, sighandler);
	signal(SIGTERM, sighandler);
	signal(SIGUSR1, sighandler);
	...
}

void sighandler(int sig)
{
        int status = 1;
        if (sig == SIGCHLD)
        {
		...
        }
        save();
        /*
         * SIGUSR1 used to reconfigure the language. Now it
         * only saves the database.
         */
        if (sig == SIGUSR1)
                return;
        cleanup();
        exit(status);
}

as you can see only SIGUSR1 is catched, SIGUSR2 is not, so i guess
cdebconf was terminated when you sent him SISGUSR2 as a default action for uncatched signals. Still, i don't understand why catching SIGUSR1 if no actions are taken (it's some kind of dummy handler).

I made some experiments with gdb on i386 with pure DFB applications and DFB uses internally (inter threads signaling) SIGUSR1 to switch from a DFB VT to another, and SIGUSR2 to switch back to the DFB VT.

The fact that cdebconf does not get killed when SIGUSR2 is issued internally by DFB makes me think cdebconf does not receive signals sent by DFB to himself (inter-threads signaling done in DFB).

When cdebconf is run with the GTKDFB frontend, both DFB and cdebconf compete to catch SIGUSR1, while SIGUSR2 is catched by DFB only.

Because DFB is started after cdebconf, its signal handler overwrites cdebconf's one: placing a breakpoint on sighandler() and sending SIGUSR1 to cdebconf showed that the breakpoint was never reached by cdebconf and DFB only complained about SIGUSR1 received outside of a VT switching operation.

Summarizing, this is what happens when SIGUSR1/2 is sent to the cdbconf process from the outside (on i386)

*If cdebconf is run with a frontend different from GTK on DFB (signals catched by cdebconf only)

Sending SIGUSR1 causes no crash, as sisgnal is dummy handled.
Sending SIGUSR2 causes cdebconf quitting as default behaviour for uncatched signals.

*If cdebconf is run with GTK on DFB frontend (signals catched by DFB instead of cdebconf)

Sending SIGUSR1 causes DFB to freeze, sending SIGUSR2 later to resume correctly.
Sending twice in row SIGUSR1 makes DFB it crash.
Sending SIGUSR2 one single time makes DFB crash.

So, on i386 at least, DFB's signal handling mechanism does not seem to clash with cdebconf's, which simply no longer receives signals from the outern world.

Note that if some d-i application, for some reason, should send cdebconf a SIGUSR1 signal when cdebconf is running with the GTKDFB frontend, this would mean crashing DFB. We should make sure that no applications send cdebconf SIGUSR1 ever, unless we patch DFB to catch other signals than SIGUSR1/2.

Moving DFB's VT switching mechanism to other signals should prevent this potential specific crash anyway.

Still i don't understand why DFB crashs when VT switching is performed, maybe this crash is not related to VT switching? maybe signal handling on AMD64 happens totally differently than on i386? ATM i can only think about replacing SIGUSR1/2 in DFB with other signals and see if this crash gets fixed: i'll try to produce the patch.

cheers

Attilio




Reply to: