[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#136707: libdb2 and libdb3 symbol conflicts and ldso library loading



package: glibc
severity: serious
Justification: wrong version of db2 and db3 may be used; posibble segfaults etc
[Ben, I realize from our IRC conversation that you are aware of this
issue.  I couldn't find a bug open on it though; If was sloppy please
merge this bug as appropriate.  I'm opening a bug because  that's how
we track issues and I want to see this fixed for woody.  I'm copying
debian-devel  because this issue has been discussed there in the past,
because it is complex, and because I hope we all end up on the same
page eventually.]


Ben and I had an IRC conversation about shared libraries in which the
following facts were revealed.  I believe many of these facts were
previously known to Ben.

0) There have been long standing issues dealing with conflicts between
libdb2 and libdb3 It is common for both libraries to be loaded
into the same address space.  For example, libnss-db may use db2
while LDAP uses libdb3.  It is critical that the right version of
db be used by the right code.  This is made more difficult by the
fact that db2 and db3 share common symbols.  Many of the common
symbols are internal; they are not exposed to the application but
if the wrong symbol is used the db2 library may call into internal
parts of the db3 library resulting in chaos.  Other symbols are
common to the public ABI of the two version of db.  Thus it is
important both that internal symbols and external symbols be
resolved.  For more background please see the DB3 symbol collision
solved thread from November of 2000 and the db3 thread from around
March of 2001.  Please read these threads before making
suggestions along the lines of this problem is not important or we
don't need to solve it, or we should just bump the sonames of all
the libraries involved.  All these suggestions are wrong and have
been discussed to death in the past.

1) Ben said that in order to  solve the libdb2 and libdb3 problems he
linked the libraries with the -Bsymbolic flag.  According to the ld
manpage on GNU/Linux,  the -Bsymbolic flag has the following
effects:


  When creating a shared  library,  bind  references  to
  global  symbols  to  the  definition within the shared
  library, if any.  Normally, it is possible for a  pro-
  gram  linked  against a shared library to override the
  definition within the shared library.  This option  is
  only  meaningful on ELF platforms which support shared
  libraries.

>From this description, it seems that -Bsymbolic  should solve the
internal symbol collision problem.  That is, if db2's functions call
an internal symbol shared between db2 and db3  they will get the db2
symbol because they have been tightly bound to that symbol at link
time.  Similarly db3 will get the db3 internal symbols.

2) Ben claims that  when elf is looking for a symbol  it searches the
direct dependencies of a library before searching the application
namespace.  That is, if libsasl7 links directly against libdb2 and
calls some external symbol   of the db2 API it will  get the symbol
from db2 even the application later links against libdb3.  Similarly
if if an application links against libdb3, libsasl and calls a symbol
common to both the db2 and db3 ABI it will get the db3 version.

3) I presented test cases showing that the claim made in item 2 is not
currently true.  The specific test case was simpler than the
db2/db3/libsasl7 test case, involving test symbols and a very simple
application.  I'll be happy to make it available to anyone who wants
to look; it is probably still at
http://www.mit.edu/afs/sipb/user/hartmans/elf-test.tar.gz.  Ben at
least implicitly agreed that the test case demonstrated that the
libdb2 libdb3 solution was not working.

4)  Ben indicated he had been seeing other signs that something had
recently broken  libdb2 libdb3 conflict resolution.

5)  Ben suggested that the problem likely was in ldso (thus I'm
reporting against glibc) because the packages in question didn't seem
to be relinked lately.  I tend to agree that if such a problem exists
it is in ldso; objdump on  the libdb2 and libdb3 libraries suggests
they are correctly using -Bsymbolic and correctly specify the
DT_NEEDED entries.


I've been looking over some of my testing machines with older libcs
and I can still reproduce the problem.  I've begun to consider an
alternate hypothesis:  the problem never was really fixed.  I believe
this hypothesis is possible because it turns out that there really are
fairly few common external symbols shared between the public db2 ABI
and the db3 ABI.  If the internal symbol resolution problem is not
solved  the results are fairly obvious and 4profound--near instant
segfaults.  However with the internal symbol problem solved and the
external symbol problem still existing   might not be noticed for a
long time.  For example, if slapd always tended to use db3, you might
only notice issues if you used SASL authentication  with cram-md5 or
some other entry in the SASL password database.  That's a fairly
uncommon setup for an LDAP server.

So, if Ben is able to find the ldso issue that's great.  But if not,
we should eventually consider  the possibility that we never solved
this correctly  and look for other solutions.  I do have ideas in this
area based on some things said last March, but don't really want to
muddy the waters with that right now; if it's just an ldso bug, let's
fix that and get on with woody.

Finally, I'd like to propose a minimum standard of proof that we've
fixed this issue.  I think this standard of proof is appropriate as a
minimum because   we've been dealing with db3 problems for the last
year, and I'd like to see it get  dealt with one final time and stop
biting us.  Therefore I propose that a proposed proof of fix for this
bug should include:

* An  specific version of some application  that directly links to libdb3
* A specific version of libc (ldso), binutils, libdb2 and libdb3
* A library linked directly by the application that links to libdb2
  (I'll call it libsasl but it may be something else)
* A symbol in  the public ABI of libdb2 called by libsasl and in the
  public ABI  of libdb3 called by the application
* An explanation  of how to configure/use  the application in a manner
  that will  get the symbol  called in the db3 library by the
  application and  in the db2 library by libsasl.
* An explanation of some observable incorrect behavior if  the wrong
  version  of the symbol  is called.
* Logs, transcripts or other proof showing that when the application
  is configured as specified and used as specified the correct rather
  than incorrect behavior happens.

I realize that's a lot of documentation--far more than is usually
associated with a Debian bug.  However  based on the complexity of the
issue and how often we have gotten this wrong in the past, I think
that level of documentation is necessary.

I'm certainly willing to help test fix fixes to this problem, develop
test frameworks to help fix the problem or work on solutions if
-Bsymbolic turns out not to be good enough.  I don't consider myself
qualified to go digging around in the guts of ld.so to fix or find the
problem so I don't think involving me in that phase of the work would
be useful.





Reply to: