[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Symbols/shlibs files for Java



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hmm, seems like GPG (or maybe just enigmail) is confused by something in
my last email and marks my signature as invalid.  So... for the record I
did write the email quoted in full below.
  Sorry for the duplicate email.

On 2011-05-30 13:31, Niels Thykier wrote:
> On 2011-05-28 14:30, Raphael Hertzog wrote:
>> Hi Niels,
> 
> 
> Hey
> 
> (Added d-java to CC)
> 
>> On Wed, 25 May 2011, Niels Thykier wrote:
>>> First of all I was hoping that you might have some "Do" or "Don't"
>>> pointers from when dpkg added support for these things.
> 
>> Do not underestimate the task. Apart from that, I'm sorry I'm not sure
>> what kind of advice I can give you. :-)
> 
> 
> :)
> 
>>> Secondly, there might be some code or infrastructure that could be
>>> shared.
> 
>> I would love to generalize the principle of auto-generated dependencies
>> to cover more than just C libraries but we're far from that, i.e. there's
>> no infrastructure in place for this and all the code in dpkg-gensymbols
>> and dpkg-shlibdeps is highly specific to the case of C libraries/binaries.
> 
> 
> Could we begin refactoring this towards something similar to the
> $NS::Source::Package setup (e.g. $NS::SymbolsFile::$LANG)? Or would you
> rather see a different approach to this code-wise
> 
>>>   Particularly I am interested in how you handle mapping
>>> filenames/SONAMES to a package (especially in cases like libc6, where
>>> there more than one lib in the package).
> 
>> There's nothing magical here. Once we have a SONAME, we find the library
>> on the system (using the same path that ld.so would use). Once we have
>> the complete filename, dpkg -S /the/file returns the package name. And
>> with the package name we're checking the content of
>> /var/lib/dpkg/info/<pkg>.shlibs (but you have to use dpkg-query
>> --control-path <pkg> shlibs to find that path).
> 
> 
> Aw, I was hoping for dragons and magic. :P  But yeah, I should have seen
> the dpkg -S; I have been using it before.
> 
>>>   We also have cases where two packages provide the same library and it
>>> would be optimal for us to end up with libX-java | libY-java in the
>>> depends, but I have a feeling that is not entirely trivial to support
>>> (in a sane way).
> 
>> Well, both packages need to provide this dependency. There's no way the
>> system can know that there is some other libraries that could fulfill the
>> same role and that it needs to put an alternative in the dependency.
> 
> 
> I had a feeling you might say that.
> 
>>> I intend to have all the tools to support this in the javahelper
>>> package.  I am not too sure that we can recycle the existing formats
>>> (maybe the shlibs format with s/SONAME/filename/) as we have to check
>>> for things like classes, return-types, inheritance and method
>>> overloading as well.  But feel free to correct me if symbols files
>>> already have support for this.
> 
>> Sorry, I have too few java knowledge to answer this.
> 
>> Cheers,
> 
> So I have been looking at this a bit more; the shlibs format actually
> looks fully recyclable, assuming we can somehow tell a "C"-shlibs file
> from a "Java"-shlibs file and map "SONAME" to filename accordingly.
> 
> ... and if I discard my desire to record all access qualifiers and such,
> I think the symbols file is mostly re-usable if we encode things right.
> 
> But first, a quick Java lecture so we are all more or less on the same
> page.  I will (where possible) map Java terms to C++.  As I understood
> Jonathan, C++ maps/mangles all constructors and methods into a flat
> function name and builds a C library out of that.
>   A Java library consists of 0 or more class files stored in a jar (zip)
> file.  The meta data (such as dependencies) are stored in the manifest
> file (plain text file).  We can extract almost everything we need from
> said class files (method-signatures etc) and the manifest file.
> 
> Java does a well-defined mangling of method names in the class files[1].
>  The mangled method could trivially be prefixed by its class name
> (either in binary or source format[2]).
> 
> So the parseInt example from [1] could be stored in the symbols file as:
> 
>   java.lang.Integer.parseInt(Ljava/lang/String;I)I
> 
> - or -
> 
>   java/lang/Integer.parseInt(Ljava/lang/String;I)I
> 
> Which would tell us that the class java.lang.Integer has a method with
> the signature "int parseInt(java.lang.String,int)".  Personally I would
> prefer the second option of those two since it is easier to map to a
> file name in the jar file (plus it consistently uses the binary name
> instead of mixing source and binary name).
>   ((For the rest of the email I will be using the format resembling the
> latter of the two in the example above.))
> 
> In that case it is trivial to recycle the current symbols format (modulo
> using possibly forbidden characters in the symbol names).  Such as:
> 
>   java/lang/Integer.parseInt(Ljava/lang/String;I)I@Base 1.1
> 
> This obviously assumes we can tell a C-symbols file from a Java-symbols
> file and map the "SONAME" part accordingly.  Since symbols cannot be
> versioned like in C, I believe that the @Base part would be redundant
> for Java.
> 
> 
> The only thing missing is how to handle a regular field / constant; here
> I see two "easy" options.  Either use <encoded-type><field-name> or
> <field-name><delimiter><encoded-type>.  Assuming ":" is the delimiter
> for the section option, it would like:
> 
>   Imy/finctional/Code.length@Base 1.1
>   Ljava/io/PrintStream;java.lang.System.out@Base 1.1
> 
> - or -
> 
>   my/finctional/Code.length:I@Base 1.1
>   java/lang/System.out:Ljava/io/PrintStream;@Base 1.1
> 
> Encoding the two fictional fields (or constants) "int length;" and
> "java.io.PrintStream out;" in the class my.finctional.Code and
> java.lang.System (respectively).  But I welcome alternatives.
> Particularly, for for enums the type of the enum constant would be the
> same as the class it is in,
> 
> - From there we can extend it to having "class-sections", e.g. something
> like:
> 
>   class java/lang/Integer
>     parseInt(Ljava/lang/String;I)I@Base 1.1
>     # other symbols in java/lang/Integer
>   class java/lang/String
>     # symbols in java/lang/String
> 
> This would reduce the size of the symbols file.  From there on we could
> always extend the format to include more information and check for
> compatibility breakage beyond symbols.
> 
> The Java examples for symbols files in this mail are mere suggestions,
> so if anyone has a better format to encode it in, please argue for it
> and its advantages.
> 
> ~Niels
> 
> 
> [1] An example:
> 
>   int parseInt(String str, int radix);
> 
> is mangled to (in byte-code format):
> 
>   parseInt(Ljava/lang/String;I)I
> 
> Where "java/lang/String" is the binary name of the String class (L and ;
> are start and end-markers) and "I" is the binary name of the "int".  The
> "I" after the end bracket denotes the return type, the types inside the
> brackets are the arguments (in order).
> 
> Reference:
> http://java.sun.com/docs/books/jvms/second_edition/html/ClassFile.doc.html#1169
> 
> http://java.sun.com/docs/books/jvms/second_edition/html/ClassFile.doc.html#84645
> 
> [2] The (fully qualifying) source format of (e.g.) "String" is
> java.lang.String; the binary format is java/lang/String - which is just
> a ".class" short of being the filename.
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBCAAGBQJN44FMAAoJEAVLu599gGRC5okP/3AaKV/NHTjj3qlMwi6KlW68
d4gT/KleGasQXD4WPtP3q3J0grwIWO+RmZOIrC1hZNqIUi0/YuH4XeWYTPYDYkpe
jWKTl6ygyZzzhnrNSt1kJxPzJIcG5XkD3KWrn7ttXFml/81pBglvZ0C++BGhG0d2
wWAbbkyOg7qnSKEcRKUDkDQZABqxg8ZMrtWrlh8g4tWMkk6UDHC3qoKZK8ZcagYc
D4snhZN+Tw6/lHrZXbH2rjwf7oo59fJwrxxJ4gN7cMDBIV8Yrep4ex6Cu0GPsNW+
n8sARyoHB2ep+wmEQiemnHol7PWqIQz0CUBDj12sPm3Exvp/gnWoFJ5WfPf16JVK
WDzAlpV3UgzWdMVbrW9FXGlU6332uPE5FpPxO0rwb8DKTK+JLoA824cw3D6bKOKY
a8plXGmZymzMPxpfTJt3BSSmMLmkv3ukiE07MjaF1dg+pY3dY3bHmVk9YYnSTG0c
EKAAC3JtLJR/DXdzx3UbhBjnLX6ZuBzXfkaRlsDgPFPRXZuS5cA6rtQyJIuE6upQ
iRYteFbGr2R75X9DJgCxH4tO2Kav6KyItuMPqjRsC9wpssVbJKBTKH+QDhNRQ0KH
1BG0XD4cR/wRN/9ZmEX49gE/he3Ac7fDZfmTeD7NlbIjQ/hTVKIZ/PpoiC6yZQ8e
Z5pWYwL9h/Q7p9DuTYAy
=eQfm
-----END PGP SIGNATURE-----


Reply to: