[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Segfaults in seemingly unrelated programs -- SOLVED



ldd gives a list of the libraries on which a program depends. After
booting in single user mode, I choose a simple program (login) among the
ones that were segfaulting and then methodically began reinstalling its
libraries one by one.

To find out which package contained the file which ldd was telling me
was part of the program's dependancies, I used the package content
search feature on the Debian site.

Once the package name was idetified, I just had to "apt-get install
--reinstall packagename" in order to reinstall it.

On second try, I found the culprit : pam. The library had been corrupted
in the disk crash, and reinstalling the package solved the problem.

It feels great when everything is working again. I'm glad I solved this
one without massive reinstallation. That's something I love in Debian :
it's almost always possible to solve a problem by really solving it, not
by reinstalling everything from scratch as it is almost always the case
on some other well known OS.

Thanks to the unknown guy on the Debian IRC channel that introduced me
to the use of ldd.

For reference and indexing, here is the text of my original post.

On Sun, 2002-02-03 at 00:16, jim@leary.csoft.net wrote:
> Due to a faulty fan, one CPU overheated and brought the system down. On
> restart, fsck indicated that some filesystem corruption occured.
> 
> On startup, gdm would not start. After entering my username in the
> console, the login prompt came back without giving me the opportunity to
> enter my password. The logical next step, booting in single user mode.
> 
> In single user mode, quickly appeared that a few programs segfault.
> Among them : su, apache, gdm, smbd, nmbd, cron, pppd, and login. Mostly
> everything else superficially seems to work, with a few exceptions.
> 
> So I tried to find out what these program could have in common appart
> from creating tasks with a different user than the one under which they
> are run. I suspected that they all depended on a library whose file the
> crash corrupted. So off I went with ldd. Apart from the omnipresent
> libc6 (without which not much does anything at all), the prime suspect
> was libcrypt. It seems that anything that uses libcrypt crashes the
> moment it calls it. I only say "it seems" because I was unable to be
> more conclusive after observation of strace output. But it may be
> because I am not familiar with strace.
> 
> I observed one exception : makepasswd. Strace shows it calling something
> from libcrypt, but it does its job with no problem. I compared
> /lib/libcrypt.so.1 between the broken server and another machine with
> the same OS, and the file sizes were identical. So I have no proof that
> libcrypt is guilty and my feelings toward this hypothesis may be
> completely wrong.
> 
> Here is an example of strace outsput. The program studied is "login"
> (the one that generates the console login prompt).
> 
> It begins with calls in 
> /lib/libcrypt.so.1
> /lib/libpam.so.0
> /lib/libpam_misc.so.0
> /lib/libdl.so.2
> 
> Then, on the sane system it goes like the following. It's the same on
> the broken system, except that the memory addresses are not the same.
> 
> open("/lib/libc.so.6", O_RDONLY)        = 3
> read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\230\327"...,
> 1024) = 1
> fstat64(3, {st_mode=S_IFREG|0755, st_size=1170492, ...}) = 0
> old_mmap(NULL, 1187296, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) =
> 0x4005c000
> mprotect(0x40174000, 40416, PROT_NONE)  = 0
> old_mmap(0x40174000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED,
> 3, 0x1
> old_mmap(0x4017a000, 15840, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_ANO
> close(3)                                = 0
> munmap(0x40016000, 40843)               = 0
> 
> Here, login on the broken machine segfaults :
> --- SIGSEGV (Segmentation fault) ---
> +++ killed by SIGSEGV +++
> 
> Except that instead the memory address on the last line is different :
> munmap(0x40016000, 35897)               = 0
> 
> I dont know if that detail is relevant, but since some (but not all) of
> the segfaulting programs end the same way, I thought it might be.
> 
> On the sane system, here is the beginning of what follows in the strace
> after the point where it has segfaulted on the broken system.
> 
> brk(0)                                  = 0x80546dc
> brk(0x8054704)                          = 0x8054704
> brk(0x8055000)                          = 0x8055000
> getuid32()                              = 0
> ioctl(0, SNDCTL_TMR_TIMEBASE, {B38400 opost isig icanon echo ...}) = 0
> ioctl(0, SNDCTL_TMR_TIMEBASE, {B38400 opost isig icanon echo ...}) = 0
> brk(0x8057000)                          = 0x8057000
> readlink("/proc/self/fd/0", "/dev/pts/2", 4095) = 10
> socket(PF_UNIX, SOCK_STREAM, 0)         = 3
> connect(3, {sin_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = -1
> ENOENT
> close(3)                                = 0
> open("/etc/nsswitch.conf", O_RDONLY)    = 3
> 
> I thought it might give some elements of context.
> 
> If anyone has read this far, thank you. At that point, I am somewhat out
> of my depth to say the least. Any hint that can help me pin down the
> cause of my misery is more than welcome.
> 
> And yes, I do have backups of my data, but not of the operating system.


Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: