[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

debian package of hadoop



Hi,

today I tried to run the cloudera debian dist on a 4 machine cluster. I still 
have some itches, see my list below. Some of them may require a fix in the 
packaging.
Therefor I thought that it may be time to start an official debian package of 
hadoop with a public GIT repository so that everybody can participate.
Would cloudera support this? I'd package hadoop 0.20 and apply all the 
cloudera patches (managed with topgit[1]).
At this point I'd like to have your opinion whether it would be wise to have 
versioned binary packages like hadoop-18, hadoop-20 or just plain hadoop for 
the Debian package?

My issues so far:

start-dfs.sh started only the local namenode, not the secondary NN nor the 
datanodes without indicating any error. masters and slaves were configured 
correctly.

When starting the datanodes manually they were recognized and HDFS worked. 
However the web UIs at port 50075 show only a directory view with a WEB-INF/ 
directory in it. This may most likely be a packaging/configuration issue.
The same with the SNN on port 50090.

The SNN shows:
java.io.FileNotFoundException: 
http://192.168.122.166:50070/getimage?putimage=1&port=50090&machine=127.0.1.1&token=-18:737152035:0:1262195990000:1262194649873
 at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1288

I did not get MapReduce working yet. It seems that it would be very helpfull 
to have some more verbose example configuration files. These should have 
commented properties for the most important settings.

[1] http://lists.alioth.debian.org/pipermail/vcs-pkg-discuss/2009-
December/000688.html

Have a nice New Year's Eve,

Thomas Koch, http://www.koch.ro


Reply to: