Re: Symbols/shlibs files for Java
-----BEGIN PGP SIGNED MESSAGE-----
Hmm, seems like GPG (or maybe just enigmail) is confused by something in
my last email and marks my signature as invalid. So... for the record I
did write the email quoted in full below.
Sorry for the duplicate email.
On 2011-05-30 13:31, Niels Thykier wrote:
> On 2011-05-28 14:30, Raphael Hertzog wrote:
>> Hi Niels,
> (Added d-java to CC)
>> On Wed, 25 May 2011, Niels Thykier wrote:
>>> First of all I was hoping that you might have some "Do" or "Don't"
>>> pointers from when dpkg added support for these things.
>> Do not underestimate the task. Apart from that, I'm sorry I'm not sure
>> what kind of advice I can give you. :-)
>>> Secondly, there might be some code or infrastructure that could be
>> I would love to generalize the principle of auto-generated dependencies
>> to cover more than just C libraries but we're far from that, i.e. there's
>> no infrastructure in place for this and all the code in dpkg-gensymbols
>> and dpkg-shlibdeps is highly specific to the case of C libraries/binaries.
> Could we begin refactoring this towards something similar to the
> $NS::Source::Package setup (e.g. $NS::SymbolsFile::$LANG)? Or would you
> rather see a different approach to this code-wise
>>> Particularly I am interested in how you handle mapping
>>> filenames/SONAMES to a package (especially in cases like libc6, where
>>> there more than one lib in the package).
>> There's nothing magical here. Once we have a SONAME, we find the library
>> on the system (using the same path that ld.so would use). Once we have
>> the complete filename, dpkg -S /the/file returns the package name. And
>> with the package name we're checking the content of
>> /var/lib/dpkg/info/<pkg>.shlibs (but you have to use dpkg-query
>> --control-path <pkg> shlibs to find that path).
> Aw, I was hoping for dragons and magic. :P But yeah, I should have seen
> the dpkg -S; I have been using it before.
>>> We also have cases where two packages provide the same library and it
>>> would be optimal for us to end up with libX-java | libY-java in the
>>> depends, but I have a feeling that is not entirely trivial to support
>>> (in a sane way).
>> Well, both packages need to provide this dependency. There's no way the
>> system can know that there is some other libraries that could fulfill the
>> same role and that it needs to put an alternative in the dependency.
> I had a feeling you might say that.
>>> I intend to have all the tools to support this in the javahelper
>>> package. I am not too sure that we can recycle the existing formats
>>> (maybe the shlibs format with s/SONAME/filename/) as we have to check
>>> for things like classes, return-types, inheritance and method
>>> overloading as well. But feel free to correct me if symbols files
>>> already have support for this.
>> Sorry, I have too few java knowledge to answer this.
> So I have been looking at this a bit more; the shlibs format actually
> looks fully recyclable, assuming we can somehow tell a "C"-shlibs file
> from a "Java"-shlibs file and map "SONAME" to filename accordingly.
> ... and if I discard my desire to record all access qualifiers and such,
> I think the symbols file is mostly re-usable if we encode things right.
> But first, a quick Java lecture so we are all more or less on the same
> page. I will (where possible) map Java terms to C++. As I understood
> Jonathan, C++ maps/mangles all constructors and methods into a flat
> function name and builds a C library out of that.
> A Java library consists of 0 or more class files stored in a jar (zip)
> file. The meta data (such as dependencies) are stored in the manifest
> file (plain text file). We can extract almost everything we need from
> said class files (method-signatures etc) and the manifest file.
> Java does a well-defined mangling of method names in the class files.
> The mangled method could trivially be prefixed by its class name
> (either in binary or source format).
> So the parseInt example from  could be stored in the symbols file as:
> - or -
> Which would tell us that the class java.lang.Integer has a method with
> the signature "int parseInt(java.lang.String,int)". Personally I would
> prefer the second option of those two since it is easier to map to a
> file name in the jar file (plus it consistently uses the binary name
> instead of mixing source and binary name).
> ((For the rest of the email I will be using the format resembling the
> latter of the two in the example above.))
> In that case it is trivial to recycle the current symbols format (modulo
> using possibly forbidden characters in the symbol names). Such as:
> java/lang/Integer.parseInt(Ljava/lang/String;I)I@Base 1.1
> This obviously assumes we can tell a C-symbols file from a Java-symbols
> file and map the "SONAME" part accordingly. Since symbols cannot be
> versioned like in C, I believe that the @Base part would be redundant
> for Java.
> The only thing missing is how to handle a regular field / constant; here
> I see two "easy" options. Either use <encoded-type><field-name> or
> <field-name><delimiter><encoded-type>. Assuming ":" is the delimiter
> for the section option, it would like:
> Imy/finctional/Code.length@Base 1.1
> Ljava/io/PrintStream;java.lang.System.out@Base 1.1
> - or -
> my/finctional/Code.length:I@Base 1.1
> java/lang/System.out:Ljava/io/PrintStream;@Base 1.1
> Encoding the two fictional fields (or constants) "int length;" and
> "java.io.PrintStream out;" in the class my.finctional.Code and
> java.lang.System (respectively). But I welcome alternatives.
> Particularly, for for enums the type of the enum constant would be the
> same as the class it is in,
> - From there we can extend it to having "class-sections", e.g. something
> class java/lang/Integer
> parseInt(Ljava/lang/String;I)I@Base 1.1
> # other symbols in java/lang/Integer
> class java/lang/String
> # symbols in java/lang/String
> This would reduce the size of the symbols file. From there on we could
> always extend the format to include more information and check for
> compatibility breakage beyond symbols.
> The Java examples for symbols files in this mail are mere suggestions,
> so if anyone has a better format to encode it in, please argue for it
> and its advantages.
>  An example:
> int parseInt(String str, int radix);
> is mangled to (in byte-code format):
> Where "java/lang/String" is the binary name of the String class (L and ;
> are start and end-markers) and "I" is the binary name of the "int". The
> "I" after the end bracket denotes the return type, the types inside the
> brackets are the arguments (in order).
>  The (fully qualifying) source format of (e.g.) "String" is
> java.lang.String; the binary format is java/lang/String - which is just
> a ".class" short of being the filename.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
-----END PGP SIGNATURE-----