Re: Symbols/shlibs files for Java
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Hmm, seems like GPG (or maybe just enigmail) is confused by something in
my last email and marks my signature as invalid. So... for the record I
did write the email quoted in full below.
Sorry for the duplicate email.
On 2011-05-30 13:31, Niels Thykier wrote:
> On 2011-05-28 14:30, Raphael Hertzog wrote:
>> Hi Niels,
>
>
> Hey
>
> (Added d-java to CC)
>
>> On Wed, 25 May 2011, Niels Thykier wrote:
>>> First of all I was hoping that you might have some "Do" or "Don't"
>>> pointers from when dpkg added support for these things.
>
>> Do not underestimate the task. Apart from that, I'm sorry I'm not sure
>> what kind of advice I can give you. :-)
>
>
> :)
>
>>> Secondly, there might be some code or infrastructure that could be
>>> shared.
>
>> I would love to generalize the principle of auto-generated dependencies
>> to cover more than just C libraries but we're far from that, i.e. there's
>> no infrastructure in place for this and all the code in dpkg-gensymbols
>> and dpkg-shlibdeps is highly specific to the case of C libraries/binaries.
>
>
> Could we begin refactoring this towards something similar to the
> $NS::Source::Package setup (e.g. $NS::SymbolsFile::$LANG)? Or would you
> rather see a different approach to this code-wise
>
>>> Particularly I am interested in how you handle mapping
>>> filenames/SONAMES to a package (especially in cases like libc6, where
>>> there more than one lib in the package).
>
>> There's nothing magical here. Once we have a SONAME, we find the library
>> on the system (using the same path that ld.so would use). Once we have
>> the complete filename, dpkg -S /the/file returns the package name. And
>> with the package name we're checking the content of
>> /var/lib/dpkg/info/<pkg>.shlibs (but you have to use dpkg-query
>> --control-path <pkg> shlibs to find that path).
>
>
> Aw, I was hoping for dragons and magic. :P But yeah, I should have seen
> the dpkg -S; I have been using it before.
>
>>> We also have cases where two packages provide the same library and it
>>> would be optimal for us to end up with libX-java | libY-java in the
>>> depends, but I have a feeling that is not entirely trivial to support
>>> (in a sane way).
>
>> Well, both packages need to provide this dependency. There's no way the
>> system can know that there is some other libraries that could fulfill the
>> same role and that it needs to put an alternative in the dependency.
>
>
> I had a feeling you might say that.
>
>>> I intend to have all the tools to support this in the javahelper
>>> package. I am not too sure that we can recycle the existing formats
>>> (maybe the shlibs format with s/SONAME/filename/) as we have to check
>>> for things like classes, return-types, inheritance and method
>>> overloading as well. But feel free to correct me if symbols files
>>> already have support for this.
>
>> Sorry, I have too few java knowledge to answer this.
>
>> Cheers,
>
> So I have been looking at this a bit more; the shlibs format actually
> looks fully recyclable, assuming we can somehow tell a "C"-shlibs file
> from a "Java"-shlibs file and map "SONAME" to filename accordingly.
>
> ... and if I discard my desire to record all access qualifiers and such,
> I think the symbols file is mostly re-usable if we encode things right.
>
> But first, a quick Java lecture so we are all more or less on the same
> page. I will (where possible) map Java terms to C++. As I understood
> Jonathan, C++ maps/mangles all constructors and methods into a flat
> function name and builds a C library out of that.
> A Java library consists of 0 or more class files stored in a jar (zip)
> file. The meta data (such as dependencies) are stored in the manifest
> file (plain text file). We can extract almost everything we need from
> said class files (method-signatures etc) and the manifest file.
>
> Java does a well-defined mangling of method names in the class files[1].
> The mangled method could trivially be prefixed by its class name
> (either in binary or source format[2]).
>
> So the parseInt example from [1] could be stored in the symbols file as:
>
> java.lang.Integer.parseInt(Ljava/lang/String;I)I
>
> - or -
>
> java/lang/Integer.parseInt(Ljava/lang/String;I)I
>
> Which would tell us that the class java.lang.Integer has a method with
> the signature "int parseInt(java.lang.String,int)". Personally I would
> prefer the second option of those two since it is easier to map to a
> file name in the jar file (plus it consistently uses the binary name
> instead of mixing source and binary name).
> ((For the rest of the email I will be using the format resembling the
> latter of the two in the example above.))
>
> In that case it is trivial to recycle the current symbols format (modulo
> using possibly forbidden characters in the symbol names). Such as:
>
> java/lang/Integer.parseInt(Ljava/lang/String;I)I@Base 1.1
>
> This obviously assumes we can tell a C-symbols file from a Java-symbols
> file and map the "SONAME" part accordingly. Since symbols cannot be
> versioned like in C, I believe that the @Base part would be redundant
> for Java.
>
>
> The only thing missing is how to handle a regular field / constant; here
> I see two "easy" options. Either use <encoded-type><field-name> or
> <field-name><delimiter><encoded-type>. Assuming ":" is the delimiter
> for the section option, it would like:
>
> Imy/finctional/Code.length@Base 1.1
> Ljava/io/PrintStream;java.lang.System.out@Base 1.1
>
> - or -
>
> my/finctional/Code.length:I@Base 1.1
> java/lang/System.out:Ljava/io/PrintStream;@Base 1.1
>
> Encoding the two fictional fields (or constants) "int length;" and
> "java.io.PrintStream out;" in the class my.finctional.Code and
> java.lang.System (respectively). But I welcome alternatives.
> Particularly, for for enums the type of the enum constant would be the
> same as the class it is in,
>
> - From there we can extend it to having "class-sections", e.g. something
> like:
>
> class java/lang/Integer
> parseInt(Ljava/lang/String;I)I@Base 1.1
> # other symbols in java/lang/Integer
> class java/lang/String
> # symbols in java/lang/String
>
> This would reduce the size of the symbols file. From there on we could
> always extend the format to include more information and check for
> compatibility breakage beyond symbols.
>
> The Java examples for symbols files in this mail are mere suggestions,
> so if anyone has a better format to encode it in, please argue for it
> and its advantages.
>
> ~Niels
>
>
> [1] An example:
>
> int parseInt(String str, int radix);
>
> is mangled to (in byte-code format):
>
> parseInt(Ljava/lang/String;I)I
>
> Where "java/lang/String" is the binary name of the String class (L and ;
> are start and end-markers) and "I" is the binary name of the "int". The
> "I" after the end bracket denotes the return type, the types inside the
> brackets are the arguments (in order).
>
> Reference:
> http://java.sun.com/docs/books/jvms/second_edition/html/ClassFile.doc.html#1169
>
> http://java.sun.com/docs/books/jvms/second_edition/html/ClassFile.doc.html#84645
>
> [2] The (fully qualifying) source format of (e.g.) "String" is
> java.lang.String; the binary format is java/lang/String - which is just
> a ".class" short of being the filename.
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQIcBAEBCAAGBQJN44FMAAoJEAVLu599gGRC5okP/3AaKV/NHTjj3qlMwi6KlW68
d4gT/KleGasQXD4WPtP3q3J0grwIWO+RmZOIrC1hZNqIUi0/YuH4XeWYTPYDYkpe
jWKTl6ygyZzzhnrNSt1kJxPzJIcG5XkD3KWrn7ttXFml/81pBglvZ0C++BGhG0d2
wWAbbkyOg7qnSKEcRKUDkDQZABqxg8ZMrtWrlh8g4tWMkk6UDHC3qoKZK8ZcagYc
D4snhZN+Tw6/lHrZXbH2rjwf7oo59fJwrxxJ4gN7cMDBIV8Yrep4ex6Cu0GPsNW+
n8sARyoHB2ep+wmEQiemnHol7PWqIQz0CUBDj12sPm3Exvp/gnWoFJ5WfPf16JVK
WDzAlpV3UgzWdMVbrW9FXGlU6332uPE5FpPxO0rwb8DKTK+JLoA824cw3D6bKOKY
a8plXGmZymzMPxpfTJt3BSSmMLmkv3ukiE07MjaF1dg+pY3dY3bHmVk9YYnSTG0c
EKAAC3JtLJR/DXdzx3UbhBjnLX6ZuBzXfkaRlsDgPFPRXZuS5cA6rtQyJIuE6upQ
iRYteFbGr2R75X9DJgCxH4tO2Kav6KyItuMPqjRsC9wpssVbJKBTKH+QDhNRQ0KH
1BG0XD4cR/wRN/9ZmEX49gE/he3Ac7fDZfmTeD7NlbIjQ/hTVKIZ/PpoiC6yZQ8e
Z5pWYwL9h/Q7p9DuTYAy
=eQfm
-----END PGP SIGNATURE-----
Reply to: