[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Symbols/shlibs files for Java

Hash: SHA256

On 2011-05-28 14:30, Raphael Hertzog wrote:
> Hi Niels,


(Added d-java to CC)

> On Wed, 25 May 2011, Niels Thykier wrote:
>> First of all I was hoping that you might have some "Do" or "Don't"
>> pointers from when dpkg added support for these things.
> Do not underestimate the task. Apart from that, I'm sorry I'm not sure
> what kind of advice I can give you. :-)


>> Secondly, there might be some code or infrastructure that could be
>> shared.
> I would love to generalize the principle of auto-generated dependencies
> to cover more than just C libraries but we're far from that, i.e. there's
> no infrastructure in place for this and all the code in dpkg-gensymbols
> and dpkg-shlibdeps is highly specific to the case of C libraries/binaries.

Could we begin refactoring this towards something similar to the
$NS::Source::Package setup (e.g. $NS::SymbolsFile::$LANG)? Or would you
rather see a different approach to this code-wise

>>   Particularly I am interested in how you handle mapping
>> filenames/SONAMES to a package (especially in cases like libc6, where
>> there more than one lib in the package).
> There's nothing magical here. Once we have a SONAME, we find the library
> on the system (using the same path that ld.so would use). Once we have
> the complete filename, dpkg -S /the/file returns the package name. And
> with the package name we're checking the content of
> /var/lib/dpkg/info/<pkg>.shlibs (but you have to use dpkg-query
> --control-path <pkg> shlibs to find that path).

Aw, I was hoping for dragons and magic. :P  But yeah, I should have seen
the dpkg -S; I have been using it before.

>>   We also have cases where two packages provide the same library and it
>> would be optimal for us to end up with libX-java | libY-java in the
>> depends, but I have a feeling that is not entirely trivial to support
>> (in a sane way).
> Well, both packages need to provide this dependency. There's no way the
> system can know that there is some other libraries that could fulfill the
> same role and that it needs to put an alternative in the dependency.

I had a feeling you might say that.

>> I intend to have all the tools to support this in the javahelper
>> package.  I am not too sure that we can recycle the existing formats
>> (maybe the shlibs format with s/SONAME/filename/) as we have to check
>> for things like classes, return-types, inheritance and method
>> overloading as well.  But feel free to correct me if symbols files
>> already have support for this.
> Sorry, I have too few java knowledge to answer this.
> Cheers,

So I have been looking at this a bit more; the shlibs format actually
looks fully recyclable, assuming we can somehow tell a "C"-shlibs file
from a "Java"-shlibs file and map "SONAME" to filename accordingly.

... and if I discard my desire to record all access qualifiers and such,
I think the symbols file is mostly re-usable if we encode things right.

But first, a quick Java lecture so we are all more or less on the same
page.  I will (where possible) map Java terms to C++.  As I understood
Jonathan, C++ maps/mangles all constructors and methods into a flat
function name and builds a C library out of that.
  A Java library consists of 0 or more class files stored in a jar (zip)
file.  The meta data (such as dependencies) are stored in the manifest
file (plain text file).  We can extract almost everything we need from
said class files (method-signatures etc) and the manifest file.

Java does a well-defined mangling of method names in the class files[1].
 The mangled method could trivially be prefixed by its class name
(either in binary or source format[2]).

So the parseInt example from [1] could be stored in the symbols file as:


- - or -


Which would tell us that the class java.lang.Integer has a method with
the signature "int parseInt(java.lang.String,int)".  Personally I would
prefer the second option of those two since it is easier to map to a
file name in the jar file (plus it consistently uses the binary name
instead of mixing source and binary name).
  ((For the rest of the email I will be using the format resembling the
latter of the two in the example above.))

In that case it is trivial to recycle the current symbols format (modulo
using possibly forbidden characters in the symbol names).  Such as:

  java/lang/Integer.parseInt(Ljava/lang/String;I)I@Base 1.1

This obviously assumes we can tell a C-symbols file from a Java-symbols
file and map the "SONAME" part accordingly.  Since symbols cannot be
versioned like in C, I believe that the @Base part would be redundant
for Java.

The only thing missing is how to handle a regular field / constant; here
I see two "easy" options.  Either use <encoded-type><field-name> or
<field-name><delimiter><encoded-type>.  Assuming ":" is the delimiter
for the section option, it would like:

  Imy/finctional/Code.length@Base 1.1
  Ljava/io/PrintStream;java.lang.System.out@Base 1.1

- - or -

  my/finctional/Code.length:I@Base 1.1
  java/lang/System.out:Ljava/io/PrintStream;@Base 1.1

Encoding the two fictional fields (or constants) "int length;" and
"java.io.PrintStream out;" in the class my.finctional.Code and
java.lang.System (respectively).  But I welcome alternatives.
Particularly, for for enums the type of the enum constant would be the
same as the class it is in,

- From there we can extend it to having "class-sections", e.g. something

  class java/lang/Integer
    parseInt(Ljava/lang/String;I)I@Base 1.1
    # other symbols in java/lang/Integer
  class java/lang/String
    # symbols in java/lang/String

This would reduce the size of the symbols file.  From there on we could
always extend the format to include more information and check for
compatibility breakage beyond symbols.

The Java examples for symbols files in this mail are mere suggestions,
so if anyone has a better format to encode it in, please argue for it
and its advantages.


[1] An example:

  int parseInt(String str, int radix);

is mangled to (in byte-code format):


Where "java/lang/String" is the binary name of the String class (L and ;
are start and end-markers) and "I" is the binary name of the "int".  The
"I" after the end bracket denotes the return type, the types inside the
brackets are the arguments (in order).



[2] The (fully qualifying) source format of (e.g.) "String" is
java.lang.String; the binary format is java/lang/String - which is just
a ".class" short of being the filename.

Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/


Reply to: