Re: debian package of hadoop
On Wed, Dec 30, 2009 at 07:53:43PM +0100, Thomas Koch wrote:
> today I tried to run the cloudera debian dist on a 4 machine cluster. I still
> have some itches, see my list below. Some of them may require a fix in the
> Therefor I thought that it may be time to start an official debian package of
> hadoop with a public GIT repository so that everybody can participate.
> Would cloudera support this? I'd package hadoop 0.20 and apply all the
> cloudera patches (managed with topgit).
> At this point I'd like to have your opinion whether it would be wise to have
> versioned binary packages like hadoop-18, hadoop-20 or just plain hadoop for
> the Debian package?
I have been thinking about an official Hadoop Debian package for a while
The main issue that prevents the inclusion of the current Cloudera
package into Debian is that it depends on Sun's Java. I think it would
be interesting, at least for an official Debian package, to depend on
OpenJDK in order to make it possible to distribute it in "main" instead
Also, note that in order to fit into Debian's package autobuilding
system, some scripts will probably require some tweaking. For instance,
by default Hadoop downloads dependencies at build time using ivy, but
Debian packages should use already existing packages. Incidentally,
Hadoop depends on some libraries that aren't available in Debian yet,
such as xmlenc, so there is even more work to do.
(Anyway, I'm interested in the package, so let me know if you need some
help and want to set up a group on alioth or something.)