[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Symbols/shlibs files for Java



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 2011-05-28 14:30, Raphael Hertzog wrote:
> Hi Niels,
> 

Hey

(Added d-java to CC)

> On Wed, 25 May 2011, Niels Thykier wrote:
>> First of all I was hoping that you might have some "Do" or "Don't"
>> pointers from when dpkg added support for these things.
> 
> Do not underestimate the task. Apart from that, I'm sorry I'm not sure
> what kind of advice I can give you. :-)
> 

:)

>> Secondly, there might be some code or infrastructure that could be
>> shared.
> 
> I would love to generalize the principle of auto-generated dependencies
> to cover more than just C libraries but we're far from that, i.e. there's
> no infrastructure in place for this and all the code in dpkg-gensymbols
> and dpkg-shlibdeps is highly specific to the case of C libraries/binaries.
> 

Could we begin refactoring this towards something similar to the
$NS::Source::Package setup (e.g. $NS::SymbolsFile::$LANG)? Or would you
rather see a different approach to this code-wise

>>   Particularly I am interested in how you handle mapping
>> filenames/SONAMES to a package (especially in cases like libc6, where
>> there more than one lib in the package).
> 
> There's nothing magical here. Once we have a SONAME, we find the library
> on the system (using the same path that ld.so would use). Once we have
> the complete filename, dpkg -S /the/file returns the package name. And
> with the package name we're checking the content of
> /var/lib/dpkg/info/<pkg>.shlibs (but you have to use dpkg-query
> --control-path <pkg> shlibs to find that path).
> 

Aw, I was hoping for dragons and magic. :P  But yeah, I should have seen
the dpkg -S; I have been using it before.

>>   We also have cases where two packages provide the same library and it
>> would be optimal for us to end up with libX-java | libY-java in the
>> depends, but I have a feeling that is not entirely trivial to support
>> (in a sane way).
> 
> Well, both packages need to provide this dependency. There's no way the
> system can know that there is some other libraries that could fulfill the
> same role and that it needs to put an alternative in the dependency.
> 

I had a feeling you might say that.

>> I intend to have all the tools to support this in the javahelper
>> package.  I am not too sure that we can recycle the existing formats
>> (maybe the shlibs format with s/SONAME/filename/) as we have to check
>> for things like classes, return-types, inheritance and method
>> overloading as well.  But feel free to correct me if symbols files
>> already have support for this.
> 
> Sorry, I have too few java knowledge to answer this.
> 
> Cheers,

So I have been looking at this a bit more; the shlibs format actually
looks fully recyclable, assuming we can somehow tell a "C"-shlibs file
from a "Java"-shlibs file and map "SONAME" to filename accordingly.

... and if I discard my desire to record all access qualifiers and such,
I think the symbols file is mostly re-usable if we encode things right.

But first, a quick Java lecture so we are all more or less on the same
page.  I will (where possible) map Java terms to C++.  As I understood
Jonathan, C++ maps/mangles all constructors and methods into a flat
function name and builds a C library out of that.
  A Java library consists of 0 or more class files stored in a jar (zip)
file.  The meta data (such as dependencies) are stored in the manifest
file (plain text file).  We can extract almost everything we need from
said class files (method-signatures etc) and the manifest file.

Java does a well-defined mangling of method names in the class files[1].
 The mangled method could trivially be prefixed by its class name
(either in binary or source format[2]).

So the parseInt example from [1] could be stored in the symbols file as:

  java.lang.Integer.parseInt(Ljava/lang/String;I)I

- - or -

  java/lang/Integer.parseInt(Ljava/lang/String;I)I

Which would tell us that the class java.lang.Integer has a method with
the signature "int parseInt(java.lang.String,int)".  Personally I would
prefer the second option of those two since it is easier to map to a
file name in the jar file (plus it consistently uses the binary name
instead of mixing source and binary name).
  ((For the rest of the email I will be using the format resembling the
latter of the two in the example above.))

In that case it is trivial to recycle the current symbols format (modulo
using possibly forbidden characters in the symbol names).  Such as:

  java/lang/Integer.parseInt(Ljava/lang/String;I)I@Base 1.1

This obviously assumes we can tell a C-symbols file from a Java-symbols
file and map the "SONAME" part accordingly.  Since symbols cannot be
versioned like in C, I believe that the @Base part would be redundant
for Java.


The only thing missing is how to handle a regular field / constant; here
I see two "easy" options.  Either use <encoded-type><field-name> or
<field-name><delimiter><encoded-type>.  Assuming ":" is the delimiter
for the section option, it would like:

  Imy/finctional/Code.length@Base 1.1
  Ljava/io/PrintStream;java.lang.System.out@Base 1.1

- - or -

  my/finctional/Code.length:I@Base 1.1
  java/lang/System.out:Ljava/io/PrintStream;@Base 1.1

Encoding the two fictional fields (or constants) "int length;" and
"java.io.PrintStream out;" in the class my.finctional.Code and
java.lang.System (respectively).  But I welcome alternatives.
Particularly, for for enums the type of the enum constant would be the
same as the class it is in,

- From there we can extend it to having "class-sections", e.g. something
like:

  class java/lang/Integer
    parseInt(Ljava/lang/String;I)I@Base 1.1
    # other symbols in java/lang/Integer
  class java/lang/String
    # symbols in java/lang/String

This would reduce the size of the symbols file.  From there on we could
always extend the format to include more information and check for
compatibility breakage beyond symbols.

The Java examples for symbols files in this mail are mere suggestions,
so if anyone has a better format to encode it in, please argue for it
and its advantages.

~Niels


[1] An example:

  int parseInt(String str, int radix);

is mangled to (in byte-code format):

  parseInt(Ljava/lang/String;I)I

Where "java/lang/String" is the binary name of the String class (L and ;
are start and end-markers) and "I" is the binary name of the "int".  The
"I" after the end bracket denotes the return type, the types inside the
brackets are the arguments (in order).

Reference:
http://java.sun.com/docs/books/jvms/second_edition/html/ClassFile.doc.html#1169

http://java.sun.com/docs/books/jvms/second_edition/html/ClassFile.doc.html#84645

[2] The (fully qualifying) source format of (e.g.) "String" is
java.lang.String; the binary format is java/lang/String - which is just
a ".class" short of being the filename.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBCAAGBQJN44AZAAoJEAVLu599gGRCMpkQAJvnwnttwyHv6CKs7TjTDk94
UoJ2PfSU0Y6tD+OS1MUk9t1tbSwv7UCTsXJRroaA80iNNC1ncvYzz9qTaDJzUjJi
MmKcumMNQufSIXQOX9xDSGYVTxLUrpYyZQ7ZxnGbpyfzfz5T+60ar4rkmOvfXl49
jdIXfwYBwf0R5ySkhXLZIZaPa8ikohH1fezPx9sd3YNgOEbQ64saIwNfL+wdxH+c
PwY1d6qCt5OMORYZKNstScahxqtaFsfiCWYDdhlVbaAbc7KVW7yuans9EHk3r7sN
95yc8T3UA05nrQwFlmdHuELGuzvTWJg5suUTNeIxxdXPM1r5i4Y40vEl0/3dFWYz
fiC//5r+2mLDd/FgoLfWfx/5j0i3GVM0wVabXpCa+gutyV2E/noKciiNafDz0vNW
girqQT88ra8XIMgTzaaCkP9Zb1G4aimQQ0+1LtYsUo/WOrIH44q0vzBVerJS3KJs
atLIxbDie0wft9uOitcyw7Y6/J2ysol2oJzNGqBHFx6bI4Pqunpdi3eIctyVlQ/F
FP7/AgOxeY47IgITy1XpDMediefXOfwVsu0i9SRJ1k9qXYrn3ss5HOFI6pZfOkfc
MOQDoB94eEnkN/gqkh9lD6lKTuEn7rgBGfV2n8DDAv4CZs+teq4expfbGuGs7p5P
vOSDMz3cDe21/at0dEfS
=mBuA
-----END PGP SIGNATURE-----


Reply to: