[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#256283: [pylucene-dev] Re: pylucene-dev Digest, Vol 4, Issue 2




The value of a naked gcj-compiled lucene package is limited [...]
because of the GC issues

Are you sure? Do you think a naked gcj-compiled lucene package would
be valuable for people creating swig bindings for languages such as
OCaml, Perl, PHP[1], etc.? Do you think C application programmers who
want to use lucene are better off using CLucene[2] instead of a gcj
library? Why did pylucene choose the latter?

Other language bindings would have to solve the GC issue too. What I did relies on python's ref counting and would be applicable to any other language with similar memory management.
There has been talk of changing PyLucene into something more generic like
SWIGLucene, extending the idea to more languages supported by SWIG.
There also has been talk of bringing in all lucene ports under a future Apache Lucene project umbrella (currently Java Lucene is an Apache Jakarta project).
Both are good things and I expect them to happen in the long term.

Why did I choose to do PyLucene instead of using CLucene ? CLucene is a port of Java Lucene, it is behind (as most ports are), and comes with its own set of bugs. PyLucene is not a port, it is built on the latest released Java Lucene library but is affected by GCJ's bugs, a worthwhile tradeoff since the GCJ project is very active and one of the most exciting developments in Java land at the moment.

gcj compiling a java package is pretty trivial

It's not trivial for everyone. I don't currently know how to use gcj
to create a shared library. Also, if a naked gcj-compiled lucene
library is useful, I can imagine other Debian packages in the future
will need it as a dependency.

In a way, it is even easier than compiling a bunch of C files since gcj can take a .jar file as input. But I need to patch the sources so I'm not using the .jar file but the source .java files. The following yields one lucene.o file from compiling all the lucene java files.

gcj --encoding=UTF-8 -O2 -c -o lucene.o `find lucene-1.4.1/src/java -name '*.java' -print`

I can help you with non Debian-specific PyLucene or GCJ issues. What
do you need ?

I have almost no gcj experience, and have never worked with a gcj
shared library. I need your (Andi) help with the judgement call as to
whether this is worthwhile or not.

I think that it would be worthwhile to have a PyLucene debian package provided the stock Debian gcc compiler is used to build it and is at least at version 3.4.1.
Currently, this is problematic on all platforms PyLucene is supported on:
  - on Mac OS X, I have to build a custom gcc/gcj 3.4.1, Apple's gcc doesn't
    even come with gcj at the moment
  - on Red Hat 8, 9 or Fedora Core 2, I also build gcc 3.4.1
  - on Windows, I use mingw 3.1 augmented with gcc/gcj 3.4.1

As to having a naked gcj compiled lucene package, given the unresolved gcj issues, I wouldn't even trust one unless I had built it myself on a compiler I had built too. There are just too many weird issues, some even platform specific, such as integrating java threads and python threads. On Windows, this is trivial, python doesn't use real threads. On Unixes, python uses posix threads and I had to figure out how to coax python threads into gcj's boehm-gc package using some non-public functions and structures that may change anytime (see attachCurrentThread in PyLucene.i).

There are 5 parts to PyLucene:
  - a patched java lucene compiled by gcj
  - a SWIG part with python specific type translation code that could be
    reused as a model for other similar SWIG supported languages
  - python specific java object reference management code
  - python specific python/java thread integration code
  - an optional Berkeley DB - based lucene Directory implementation (that code
    is also part of the Java Lucene sandbox, the db package)
The patched java lucene compiled by gcj is the 'easy' part. Except for the Berkeley DB part, any developer who wants to extend the PyLucene idea to other languages is going to have to reimplement the other parts.

Upsteam wasn't too excited about a naked gcj lucene library, but maybe Doug
just wasn't thinking about enabling C programmers and swig programmers. My
current feeling is to wait for gcj-3.5 to reach Debian then revisit.

Who/What is Upsteam ? Doug Cutting ?
If all gcj bugs currently worked around with patches are fixed with gcc 3.5 release, waiting may help. Figuring out what needs to be patched everytime Java Lucene releases a new library is some work.

I did notice that compiling the lucene demo executables using gcj works great, and I am tempted to ship those right away.[3]

Which version of Lucene, which version of gcj did you use ?
If you used Lucene 1.4.x and if bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15411 is not fixed then Search.java should crash quite rapidly. Also, you should have been getting a bunch of errors related to anonymous inner class constructors (the bulk of the patches in patches.lucene file are for these actually).

So, to sum up:
  - a PyLucene Debian package ? definitely !
  - a naked gcj compiled Java Lucene ?
      - worthless for integrating with other non-java languages
      - not worthless for pure gcj/java use but, apart from the patches,
        pretty trivial to build

Andi..



Reply to: