[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[RFR] templates://hadoop/{hadoop-namenoded.templates}



Please find, for review, the debconf templates and packages descriptions for the hadoop source package.

This review will last from Tuesday, March 30, 2010 to Friday, April 09, 2010.

Please send reviews as unified diffs (diff -u) against the original
files. Comments about your proposed changes will be appreciated.

Your review should be sent as an answer to this mail.

When appropriate, I will send intermediate requests for review, with
"[RFRn]" (n>=2) as a subject tag.

When we will reach a consensus, I send a "Last Chance For
Comments" mail with "[LCFC]" as a subject tag.

Finally, the reviewed templates will be sent to the package maintainer
as a bug report, and a mail will be sent to this list with "[BTS]" as
a subject tag.

Rationale:
--- hadoop.old/debian/hadoop-namenoded.templates	2010-03-22 09:56:11.717948376 +0100
+++ hadoop/debian/hadoop-namenoded.templates	2010-03-30 07:22:12.123757400 +0200
@@ -1,17 +1,17 @@
 Template: hadoop-namenoded/format
 Type: boolean
 Default: false
-_Description: Should the namenode's filesystem be formatted now?
+_Description: Should the namenode's file system be formatted?

in other packages, we standardized on "file system". Applying this all
along this review

  The namenode manages the Hadoop Distributed FileSystem (HDFS). Like a
- normal filesystem, it needs to be formatted prior to first use. If the
- HDFS filesystem is not formatted, the namenode daemon will fail to
+ normal file system, it needs to be formatted prior to first use. If the
+ HDFS file system is not formatted, the namenode daemon will fail to
  start.
  .
- This operation does not affect the "normal" filesystem on this
- computer. If you're using HDFS for the first time and don't have data
- from previous installations on this computer, it should be save to
- proceed with yes.
+ This operation does not affect other file systems on this
+ computer. You can safely choose to format the file system if you're
+ using HDFS for the first time and don't have data from previous
+ installations on this computer.

I guess that the main point is to warn users that all "other" FS are
not at risk here. So, let's mention this slightly differently (they're
not more "normal" than anything else...and there might be more than
one file system on the system, of course).

"procees with yes" is highly discouraged as it makes reference to the
way the question is shown in *some* debconf interfaces (a yes/no
question) and, anyway, it's always tricky for translators to know
whether they should translate the "yes" or not (the answer being "it
depends"..:-)).

  .
- You can later on format the filesystem yourself with
- . 
- su -c"hadoop namenode -format" hadoop
+ If you choose not to format the file system right now, you can do it
+ later by executing "hadoop namenode -format" with the hadoop user
+ privileges.

Don't waste space by splitting in two paragraphs. That will anyway
loko ugly oin some interfaces. I rephrased the paragraph so that it
doesn't depend on using "su" or not (which is not the point as the
point is executing the command as "hadoop").


--- hadoop.old/debian/control	2010-03-22 09:56:11.717948376 +0100
+++ hadoop/debian/control	2010-03-26 18:30:25.615052315 +0100
@@ -44,14 +44,54 @@
  libslf4j-java,
  libxmlenc-java
 Suggests: libhsqldb-java
-Description: software platform for processing vast amounts of data
- This package contains the core java libraries.
+Description: platform for processing vast amounts of data - Java libraries

I standardized all binary packages description as 
"general desc - specific desc"

"general desc" drops "software". After all, this is all about software
anyway? :-)

Proper(?) capitalization fo Java

+ Hadoop is a software platform that lets one easily write and
+ run applications that process vast amounts of data.
+ .
+ Here's what makes Hadoop especially useful:
+  * Scalable: Hadoop can reliably store and process petabytes.
+  * Economical: It distributes the data and processing across clusters
+                of commonly available computers. These clusters can number
+                into the thousands of nodes.
+  * Efficient: By distributing the data, Hadoop can process it in parallel
+               on the nodes where the data is located. This makes it
+               extremely rapid.
+  * Reliable: Hadoop automatically maintains multiple copies of data and
+              automatically redeploys computing tasks based on failures.
+ .
+ Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS).
+ MapReduce divides applications into many small blocks of work. HDFS creates
+ multiple replicas of data blocks for reliability, placing them on compute
+ nodes around the cluster. MapReduce can then process the data where it is
+ located.
+ .
+ This package contains the core Java libraries.

This package...and all others will carry the same boilerplate (first 3
paragraphs). I used the part you had in one of the packages.

Then 1 or 2 paragraphs will carry the part that's specific to the said package.

 
 Package: libhadoop-index-java
 Architecture: all
 Depends: ${misc:Depends}, libhadoop-java (= ${binary:Version}),
  liblucene2-java
-Description: Hadoop contrib to create lucene indexes
+Description: platform for processing vast amounts of data - create Lucene indexes

The original synopsis was quite odd (verb sentence). Keep the "create
<foo>" style, but I'd actually maybe prefer "Lucene index creation".

Other changes below are mostly applying the same principles....


+ Hadoop is a software platform that lets one easily write and
+ run applications that process vast amounts of data.
+ .
+ Here's what makes Hadoop especially useful:
+  * Scalable: Hadoop can reliably store and process petabytes.
+  * Economical: It distributes the data and processing across clusters
+                of commonly available computers. These clusters can number
+                into the thousands of nodes.
+  * Efficient: By distributing the data, Hadoop can process it in parallel
+               on the nodes where the data is located. This makes it
+               extremely rapid.
+  * Reliable: Hadoop automatically maintains multiple copies of data and
+              automatically redeploys computing tasks based on failures.
+ .
+ Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS).
+ MapReduce divides applications into many small blocks of work. HDFS creates
+ multiple replicas of data blocks for reliability, placing them on compute
+ nodes around the cluster. MapReduce can then process the data where it is
+ located.
+ .
  This contrib package provides a utility to build or update an index
  using Map/Reduce.
  .
@@ -65,7 +105,7 @@
 Architecture: all
 Depends: ${misc:Depends}, libhadoop-java (= ${binary:Version}),
  default-jre-headless | java6-runtime-headless
-Description: software platform for processing vast amounts of data
+Description: platform for processing vast amounts of data - binaries
  Hadoop is a software platform that lets one easily write and
  run applications that process vast amounts of data.
  .
@@ -94,8 +134,22 @@
 Architecture: all
 Depends: ${misc:Depends}, hadoop-bin (= ${binary:Version}), daemon, adduser,
  lsb-base (>= 3.2-14)
-Description: Creates user and directories for hadoop daemons
- Prepares some common things for all hadoop daemon packages:
+Description: platform for processing vast amounts of data - common files
+ Hadoop is a software platform that lets one easily write and
+ run applications that process vast amounts of data.
+ .
+ Here's what makes Hadoop especially useful:
+  * Scalable: Hadoop can reliably store and process petabytes.
+  * Economical: It distributes the data and processing across clusters
+                of commonly available computers. These clusters can number
+                into the thousands of nodes.
+  * Efficient: By distributing the data, Hadoop can process it in parallel
+               on the nodes where the data is located. This makes it
+               extremely rapid.
+  * Reliable: Hadoop automatically maintains multiple copies of data and
+              automatically redeploys computing tasks based on failures.
+ .
+ This package prepares some common things for all hadoop daemon packages:
   * creates the user hadoop
   * creates data and log directories owned by the hadoop user
   * manages the update-alternatives mechanism for hadoop configuration
@@ -105,14 +159,42 @@
 Section: doc
 Architecture: all
 Depends: ${misc:Depends}, libhadoop-java (= ${binary:Version})
-Description: Contains the javadoc for hadoop
- contains the api documentation of hadoop
+Description: platform for processing vast amounts of data - Java documentation
+ Hadoop is a software platform that lets one easily write and
+ run applications that process vast amounts of data.
+ .
+ Here's what makes Hadoop especially useful:
+  * Scalable: Hadoop can reliably store and process petabytes.
+  * Economical: It distributes the data and processing across clusters
+                of commonly available computers. These clusters can number
+                into the thousands of nodes.
+  * Efficient: By distributing the data, Hadoop can process it in parallel
+               on the nodes where the data is located. This makes it
+               extremely rapid.
+  * Reliable: Hadoop automatically maintains multiple copies of data and
+              automatically redeploys computing tasks based on failures.
+ .
+ This package provides the API documentation of Hadoop.
 
 Package: hadoop-tasktrackerd
 Section: misc
 Architecture: all
 Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
-Description: Task Tracker for Hadoop
+Description: platform for processing vast amounts of data - Task Tracker
+ Hadoop is a software platform that lets one easily write and
+ run applications that process vast amounts of data.
+ .
+ Here's what makes Hadoop especially useful:
+  * Scalable: Hadoop can reliably store and process petabytes.
+  * Economical: It distributes the data and processing across clusters
+                of commonly available computers. These clusters can number
+                into the thousands of nodes.
+  * Efficient: By distributing the data, Hadoop can process it in parallel
+               on the nodes where the data is located. This makes it
+               extremely rapid.
+  * Reliable: Hadoop automatically maintains multiple copies of data and
+              automatically redeploys computing tasks based on failures.
+ .
  The Task Tracker is the Hadoop service that accepts MapReduce tasks and
  computes results. Each node in a Hadoop cluster that should be doing
  computation should run a Task Tracker.
@@ -121,34 +203,90 @@
 Section: misc
 Architecture: all
 Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
-Description: Job Tracker for Hadoop
- The jobtracker is a central service which is responsible for managing
- the tasktracker services running on all nodes in a Hadoop Cluster.
- The jobtracker allocates work to the tasktracker nearest to the data
+Description: platform for processing vast amounts of data - Job Tracker
+ Hadoop is a software platform that lets one easily write and
+ run applications that process vast amounts of data.
+ .
+ Here's what makes Hadoop especially useful:
+  * Scalable: Hadoop can reliably store and process petabytes.
+  * Economical: It distributes the data and processing across clusters
+                of commonly available computers. These clusters can number
+                into the thousands of nodes.
+  * Efficient: By distributing the data, Hadoop can process it in parallel
+               on the nodes where the data is located. This makes it
+               extremely rapid.
+  * Reliable: Hadoop automatically maintains multiple copies of data and
+              automatically redeploys computing tasks based on failures.
+ .
+ The Job Tracker is a central service which is responsible for managing
+ the Task Tracker services running on all nodes in an Hadoop Cluster.
+ The Job Tracker allocates work to the tasktracker nearest to the data
  with an available work slot.
 
 Package: hadoop-namenoded
 Section: misc
 Architecture: all
 Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
-Description: Name Node for Hadoop
+Description: platform for processing vast amounts of data - name node
+ Hadoop is a software platform that lets one easily write and
+ run applications that process vast amounts of data.
+ .
+ Here's what makes Hadoop especially useful:
+  * Scalable: Hadoop can reliably store and process petabytes.
+  * Economical: It distributes the data and processing across clusters
+                of commonly available computers. These clusters can number
+                into the thousands of nodes.
+  * Efficient: By distributing the data, Hadoop can process it in parallel
+               on the nodes where the data is located. This makes it
+               extremely rapid.
+  * Reliable: Hadoop automatically maintains multiple copies of data and
+              automatically redeploys computing tasks based on failures.
+ .
  The Hadoop Distributed Filesystem (HDFS) requires one unique server, the
- namenode, which manages the block locations of files on the filesystem.
+ name node, which manages the block locations of files on the file system.
 
 Package: hadoop-secondarynamenoded
 Section: misc
 Architecture: all
 Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
-Description: Secondary Name Node for Hadoop
- The Secondary Name Node is responsible for checkpointing file system images.
- It is _not_ a failover pair for the namenode, and may safely be run on the
+Description: platform for processing vast amounts of data - secondary name node
+ Hadoop is a software platform that lets one easily write and
+ run applications that process vast amounts of data.
+ .
+ Here's what makes Hadoop especially useful:
+  * Scalable: Hadoop can reliably store and process petabytes.
+  * Economical: It distributes the data and processing across clusters
+                of commonly available computers. These clusters can number
+                into the thousands of nodes.
+  * Efficient: By distributing the data, Hadoop can process it in parallel
+               on the nodes where the data is located. This makes it
+               extremely rapid.
+  * Reliable: Hadoop automatically maintains multiple copies of data and
+              automatically redeploys computing tasks based on failures.
+ .
+ The secondary name node is responsible for checkpointing file system images.
+ It is _not_ a failover pair for the name node, and may safely be run on the
  same machine.
 
 Package: hadoop-datanoded
 Section: misc
 Architecture: all
 Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
-Description: Data Node for Hadoop
- The Data Nodes in the Hadoop Cluster are responsible for serving up
+Description: platform for processing vast amounts of data - data node
+ Hadoop is a software platform that lets one easily write and
+ run applications that process vast amounts of data.
+ .
+ Here's what makes Hadoop especially useful:
+  * Scalable: Hadoop can reliably store and process petabytes.
+  * Economical: It distributes the data and processing across clusters
+                of commonly available computers. These clusters can number
+                into the thousands of nodes.
+  * Efficient: By distributing the data, Hadoop can process it in parallel
+               on the nodes where the data is located. This makes it
+               extremely rapid.
+  * Reliable: Hadoop automatically maintains multiple copies of data and
+              automatically redeploys computing tasks based on failures.
+ .
+ The data nodes in the Hadoop Cluster are responsible for serving up
  blocks of data over the network to Hadoop Distributed Filesystem
  (HDFS) clients.

-- 


Template: hadoop-namenoded/format
Type: boolean
Default: false
_Description: Should the namenode's file system be formatted?
 The namenode manages the Hadoop Distributed FileSystem (HDFS). Like a
 normal file system, it needs to be formatted prior to first use. If the
 HDFS file system is not formatted, the namenode daemon will fail to
 start.
 .
 This operation does not affect other file systems on this
 computer. You can safely choose to format the file system if you're
 using HDFS for the first time and don't have data from previous
 installations on this computer.
 .
 If you choose not to format the file system right now, you can do it
 later by executing "hadoop namenode -format" with the hadoop user
 privileges.
--- hadoop.old/debian/hadoop-namenoded.templates	2010-03-22 09:56:11.717948376 +0100
+++ hadoop/debian/hadoop-namenoded.templates	2010-03-30 07:22:12.123757400 +0200
@@ -1,17 +1,17 @@
 Template: hadoop-namenoded/format
 Type: boolean
 Default: false
-_Description: Should the namenode's filesystem be formatted now?
+_Description: Should the namenode's file system be formatted?
  The namenode manages the Hadoop Distributed FileSystem (HDFS). Like a
- normal filesystem, it needs to be formatted prior to first use. If the
- HDFS filesystem is not formatted, the namenode daemon will fail to
+ normal file system, it needs to be formatted prior to first use. If the
+ HDFS file system is not formatted, the namenode daemon will fail to
  start.
  .
- This operation does not affect the "normal" filesystem on this
- computer. If you're using HDFS for the first time and don't have data
- from previous installations on this computer, it should be save to
- proceed with yes.
+ This operation does not affect other file systems on this
+ computer. You can safely choose to format the file system if you're
+ using HDFS for the first time and don't have data from previous
+ installations on this computer.
  .
- You can later on format the filesystem yourself with
- . 
- su -c"hadoop namenode -format" hadoop
+ If you choose not to format the file system right now, you can do it
+ later by executing "hadoop namenode -format" with the hadoop user
+ privileges.
--- hadoop.old/debian/control	2010-03-22 09:56:11.717948376 +0100
+++ hadoop/debian/control	2010-03-26 18:30:25.615052315 +0100
@@ -44,14 +44,54 @@
  libslf4j-java,
  libxmlenc-java
 Suggests: libhsqldb-java
-Description: software platform for processing vast amounts of data
- This package contains the core java libraries.
+Description: platform for processing vast amounts of data - Java libraries
+ Hadoop is a software platform that lets one easily write and
+ run applications that process vast amounts of data.
+ .
+ Here's what makes Hadoop especially useful:
+  * Scalable: Hadoop can reliably store and process petabytes.
+  * Economical: It distributes the data and processing across clusters
+                of commonly available computers. These clusters can number
+                into the thousands of nodes.
+  * Efficient: By distributing the data, Hadoop can process it in parallel
+               on the nodes where the data is located. This makes it
+               extremely rapid.
+  * Reliable: Hadoop automatically maintains multiple copies of data and
+              automatically redeploys computing tasks based on failures.
+ .
+ Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS).
+ MapReduce divides applications into many small blocks of work. HDFS creates
+ multiple replicas of data blocks for reliability, placing them on compute
+ nodes around the cluster. MapReduce can then process the data where it is
+ located.
+ .
+ This package contains the core Java libraries.
 
 Package: libhadoop-index-java
 Architecture: all
 Depends: ${misc:Depends}, libhadoop-java (= ${binary:Version}),
  liblucene2-java
-Description: Hadoop contrib to create lucene indexes
+Description: platform for processing vast amounts of data - create Lucene indexes
+ Hadoop is a software platform that lets one easily write and
+ run applications that process vast amounts of data.
+ .
+ Here's what makes Hadoop especially useful:
+  * Scalable: Hadoop can reliably store and process petabytes.
+  * Economical: It distributes the data and processing across clusters
+                of commonly available computers. These clusters can number
+                into the thousands of nodes.
+  * Efficient: By distributing the data, Hadoop can process it in parallel
+               on the nodes where the data is located. This makes it
+               extremely rapid.
+  * Reliable: Hadoop automatically maintains multiple copies of data and
+              automatically redeploys computing tasks based on failures.
+ .
+ Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS).
+ MapReduce divides applications into many small blocks of work. HDFS creates
+ multiple replicas of data blocks for reliability, placing them on compute
+ nodes around the cluster. MapReduce can then process the data where it is
+ located.
+ .
  This contrib package provides a utility to build or update an index
  using Map/Reduce.
  .
@@ -65,7 +105,7 @@
 Architecture: all
 Depends: ${misc:Depends}, libhadoop-java (= ${binary:Version}),
  default-jre-headless | java6-runtime-headless
-Description: software platform for processing vast amounts of data
+Description: platform for processing vast amounts of data - binaries
  Hadoop is a software platform that lets one easily write and
  run applications that process vast amounts of data.
  .
@@ -94,8 +134,22 @@
 Architecture: all
 Depends: ${misc:Depends}, hadoop-bin (= ${binary:Version}), daemon, adduser,
  lsb-base (>= 3.2-14)
-Description: Creates user and directories for hadoop daemons
- Prepares some common things for all hadoop daemon packages:
+Description: platform for processing vast amounts of data - common files
+ Hadoop is a software platform that lets one easily write and
+ run applications that process vast amounts of data.
+ .
+ Here's what makes Hadoop especially useful:
+  * Scalable: Hadoop can reliably store and process petabytes.
+  * Economical: It distributes the data and processing across clusters
+                of commonly available computers. These clusters can number
+                into the thousands of nodes.
+  * Efficient: By distributing the data, Hadoop can process it in parallel
+               on the nodes where the data is located. This makes it
+               extremely rapid.
+  * Reliable: Hadoop automatically maintains multiple copies of data and
+              automatically redeploys computing tasks based on failures.
+ .
+ This package prepares some common things for all hadoop daemon packages:
   * creates the user hadoop
   * creates data and log directories owned by the hadoop user
   * manages the update-alternatives mechanism for hadoop configuration
@@ -105,14 +159,42 @@
 Section: doc
 Architecture: all
 Depends: ${misc:Depends}, libhadoop-java (= ${binary:Version})
-Description: Contains the javadoc for hadoop
- contains the api documentation of hadoop
+Description: platform for processing vast amounts of data - Java documentation
+ Hadoop is a software platform that lets one easily write and
+ run applications that process vast amounts of data.
+ .
+ Here's what makes Hadoop especially useful:
+  * Scalable: Hadoop can reliably store and process petabytes.
+  * Economical: It distributes the data and processing across clusters
+                of commonly available computers. These clusters can number
+                into the thousands of nodes.
+  * Efficient: By distributing the data, Hadoop can process it in parallel
+               on the nodes where the data is located. This makes it
+               extremely rapid.
+  * Reliable: Hadoop automatically maintains multiple copies of data and
+              automatically redeploys computing tasks based on failures.
+ .
+ This package provides the API documentation of Hadoop.
 
 Package: hadoop-tasktrackerd
 Section: misc
 Architecture: all
 Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
-Description: Task Tracker for Hadoop
+Description: platform for processing vast amounts of data - Task Tracker
+ Hadoop is a software platform that lets one easily write and
+ run applications that process vast amounts of data.
+ .
+ Here's what makes Hadoop especially useful:
+  * Scalable: Hadoop can reliably store and process petabytes.
+  * Economical: It distributes the data and processing across clusters
+                of commonly available computers. These clusters can number
+                into the thousands of nodes.
+  * Efficient: By distributing the data, Hadoop can process it in parallel
+               on the nodes where the data is located. This makes it
+               extremely rapid.
+  * Reliable: Hadoop automatically maintains multiple copies of data and
+              automatically redeploys computing tasks based on failures.
+ .
  The Task Tracker is the Hadoop service that accepts MapReduce tasks and
  computes results. Each node in a Hadoop cluster that should be doing
  computation should run a Task Tracker.
@@ -121,34 +203,90 @@
 Section: misc
 Architecture: all
 Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
-Description: Job Tracker for Hadoop
- The jobtracker is a central service which is responsible for managing
- the tasktracker services running on all nodes in a Hadoop Cluster.
- The jobtracker allocates work to the tasktracker nearest to the data
+Description: platform for processing vast amounts of data - Job Tracker
+ Hadoop is a software platform that lets one easily write and
+ run applications that process vast amounts of data.
+ .
+ Here's what makes Hadoop especially useful:
+  * Scalable: Hadoop can reliably store and process petabytes.
+  * Economical: It distributes the data and processing across clusters
+                of commonly available computers. These clusters can number
+                into the thousands of nodes.
+  * Efficient: By distributing the data, Hadoop can process it in parallel
+               on the nodes where the data is located. This makes it
+               extremely rapid.
+  * Reliable: Hadoop automatically maintains multiple copies of data and
+              automatically redeploys computing tasks based on failures.
+ .
+ The Job Tracker is a central service which is responsible for managing
+ the Task Tracker services running on all nodes in an Hadoop Cluster.
+ The Job Tracker allocates work to the tasktracker nearest to the data
  with an available work slot.
 
 Package: hadoop-namenoded
 Section: misc
 Architecture: all
 Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
-Description: Name Node for Hadoop
+Description: platform for processing vast amounts of data - name node
+ Hadoop is a software platform that lets one easily write and
+ run applications that process vast amounts of data.
+ .
+ Here's what makes Hadoop especially useful:
+  * Scalable: Hadoop can reliably store and process petabytes.
+  * Economical: It distributes the data and processing across clusters
+                of commonly available computers. These clusters can number
+                into the thousands of nodes.
+  * Efficient: By distributing the data, Hadoop can process it in parallel
+               on the nodes where the data is located. This makes it
+               extremely rapid.
+  * Reliable: Hadoop automatically maintains multiple copies of data and
+              automatically redeploys computing tasks based on failures.
+ .
  The Hadoop Distributed Filesystem (HDFS) requires one unique server, the
- namenode, which manages the block locations of files on the filesystem.
+ name node, which manages the block locations of files on the file system.
 
 Package: hadoop-secondarynamenoded
 Section: misc
 Architecture: all
 Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
-Description: Secondary Name Node for Hadoop
- The Secondary Name Node is responsible for checkpointing file system images.
- It is _not_ a failover pair for the namenode, and may safely be run on the
+Description: platform for processing vast amounts of data - secondary name node
+ Hadoop is a software platform that lets one easily write and
+ run applications that process vast amounts of data.
+ .
+ Here's what makes Hadoop especially useful:
+  * Scalable: Hadoop can reliably store and process petabytes.
+  * Economical: It distributes the data and processing across clusters
+                of commonly available computers. These clusters can number
+                into the thousands of nodes.
+  * Efficient: By distributing the data, Hadoop can process it in parallel
+               on the nodes where the data is located. This makes it
+               extremely rapid.
+  * Reliable: Hadoop automatically maintains multiple copies of data and
+              automatically redeploys computing tasks based on failures.
+ .
+ The secondary name node is responsible for checkpointing file system images.
+ It is _not_ a failover pair for the name node, and may safely be run on the
  same machine.
 
 Package: hadoop-datanoded
 Section: misc
 Architecture: all
 Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
-Description: Data Node for Hadoop
- The Data Nodes in the Hadoop Cluster are responsible for serving up
+Description: platform for processing vast amounts of data - data node
+ Hadoop is a software platform that lets one easily write and
+ run applications that process vast amounts of data.
+ .
+ Here's what makes Hadoop especially useful:
+  * Scalable: Hadoop can reliably store and process petabytes.
+  * Economical: It distributes the data and processing across clusters
+                of commonly available computers. These clusters can number
+                into the thousands of nodes.
+  * Efficient: By distributing the data, Hadoop can process it in parallel
+               on the nodes where the data is located. This makes it
+               extremely rapid.
+  * Reliable: Hadoop automatically maintains multiple copies of data and
+              automatically redeploys computing tasks based on failures.
+ .
+ The data nodes in the Hadoop Cluster are responsible for serving up
  blocks of data over the network to Hadoop Distributed Filesystem
  (HDFS) clients.
Source: hadoop
Section: java
Priority: optional
Maintainer: Debian Java Maintainers <pkg-java-maintainers@lists.alioth.debian.org>
Uploaders: Thomas Koch <thomas.koch@ymc.ch>
Homepage: http://hadoop.apache.org
Vcs-Browser: http://git.debian.org/?p=pkg-java/hadoop.git
Vcs-Git: git://git.debian.org/pkg-java/hadoop.git
Standards-Version: 3.8.4
Build-Depends: debhelper (>= 7.4.11), default-jdk, ant (>= 1.6.0), javahelper (>= 0.28),
 po-debconf,
 libcommons-cli-java,
 libcommons-codec-java,
 libcommons-el-java,
 libcommons-httpclient-java,
 libcommons-io-java,
 libcommons-logging-java,
 libcommons-net-java,
 libtomcat6-java,
 libjetty-java (>>6),
 libservlet2.5-java,
 liblog4j1.2-java,
 libslf4j-java,
 libxmlenc-java,
 liblucene2-java,
 libhsqldb-java,
 ant-optional,
 javacc

Package: libhadoop-java
Architecture: all
Depends: ${misc:Depends}, 
 libcommons-cli-java,
 libcommons-codec-java,
 libcommons-el-java,
 libcommons-httpclient-java,
 libcommons-io-java,
 libcommons-logging-java,
 libcommons-net-java,
 libtomcat6-java,
 libjetty-java (>>6),
 libservlet2.5-java,
 liblog4j1.2-java,
 libslf4j-java,
 libxmlenc-java
Suggests: libhsqldb-java
Description: platform for processing vast amounts of data - Java libraries
 Hadoop is a software platform that lets one easily write and
 run applications that process vast amounts of data.
 .
 Here's what makes Hadoop especially useful:
  * Scalable: Hadoop can reliably store and process petabytes.
  * Economical: It distributes the data and processing across clusters
                of commonly available computers. These clusters can number
                into the thousands of nodes.
  * Efficient: By distributing the data, Hadoop can process it in parallel
               on the nodes where the data is located. This makes it
               extremely rapid.
  * Reliable: Hadoop automatically maintains multiple copies of data and
              automatically redeploys computing tasks based on failures.
 .
 Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS).
 MapReduce divides applications into many small blocks of work. HDFS creates
 multiple replicas of data blocks for reliability, placing them on compute
 nodes around the cluster. MapReduce can then process the data where it is
 located.
 .
 This package contains the core Java libraries.

Package: libhadoop-index-java
Architecture: all
Depends: ${misc:Depends}, libhadoop-java (= ${binary:Version}),
 liblucene2-java
Description: platform for processing vast amounts of data - create Lucene indexes
 Hadoop is a software platform that lets one easily write and
 run applications that process vast amounts of data.
 .
 Here's what makes Hadoop especially useful:
  * Scalable: Hadoop can reliably store and process petabytes.
  * Economical: It distributes the data and processing across clusters
                of commonly available computers. These clusters can number
                into the thousands of nodes.
  * Efficient: By distributing the data, Hadoop can process it in parallel
               on the nodes where the data is located. This makes it
               extremely rapid.
  * Reliable: Hadoop automatically maintains multiple copies of data and
              automatically redeploys computing tasks based on failures.
 .
 Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS).
 MapReduce divides applications into many small blocks of work. HDFS creates
 multiple replicas of data blocks for reliability, placing them on compute
 nodes around the cluster. MapReduce can then process the data where it is
 located.
 .
 This contrib package provides a utility to build or update an index
 using Map/Reduce.
 .
 A distributed "index" is partitioned into "shards". Each shard corresponds
 to a Lucene instance. org.apache.hadoop.contrib.index.main.UpdateIndex
 contains the main() method which uses a Map/Reduce job to analyze documents
 and update Lucene instances in parallel.

Package: hadoop-bin
Section: misc
Architecture: all
Depends: ${misc:Depends}, libhadoop-java (= ${binary:Version}),
 default-jre-headless | java6-runtime-headless
Description: platform for processing vast amounts of data - binaries
 Hadoop is a software platform that lets one easily write and
 run applications that process vast amounts of data.
 .
 Here's what makes Hadoop especially useful:
  * Scalable: Hadoop can reliably store and process petabytes.
  * Economical: It distributes the data and processing across clusters
                of commonly available computers. These clusters can number
                into the thousands of nodes.
  * Efficient: By distributing the data, Hadoop can process it in parallel
               on the nodes where the data is located. This makes it
               extremely rapid.
  * Reliable: Hadoop automatically maintains multiple copies of data and
              automatically redeploys computing tasks based on failures.
 .
 Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS).
 MapReduce divides applications into many small blocks of work. HDFS creates
 multiple replicas of data blocks for reliability, placing them on compute
 nodes around the cluster. MapReduce can then process the data where it is
 located.
 .
 This package contains the hadoop shell interface. See the packages hadoop-.*d
 for the hadoop daemons.

Package: hadoop-daemons-common
Section: misc
Architecture: all
Depends: ${misc:Depends}, hadoop-bin (= ${binary:Version}), daemon, adduser,
 lsb-base (>= 3.2-14)
Description: platform for processing vast amounts of data - common files
 Hadoop is a software platform that lets one easily write and
 run applications that process vast amounts of data.
 .
 Here's what makes Hadoop especially useful:
  * Scalable: Hadoop can reliably store and process petabytes.
  * Economical: It distributes the data and processing across clusters
                of commonly available computers. These clusters can number
                into the thousands of nodes.
  * Efficient: By distributing the data, Hadoop can process it in parallel
               on the nodes where the data is located. This makes it
               extremely rapid.
  * Reliable: Hadoop automatically maintains multiple copies of data and
              automatically redeploys computing tasks based on failures.
 .
 This package prepares some common things for all hadoop daemon packages:
  * creates the user hadoop
  * creates data and log directories owned by the hadoop user
  * manages the update-alternatives mechanism for hadoop configuration
  * brings in the common dependencies

Package: libhadoop-java-doc
Section: doc
Architecture: all
Depends: ${misc:Depends}, libhadoop-java (= ${binary:Version})
Description: platform for processing vast amounts of data - Java documentation
 Hadoop is a software platform that lets one easily write and
 run applications that process vast amounts of data.
 .
 Here's what makes Hadoop especially useful:
  * Scalable: Hadoop can reliably store and process petabytes.
  * Economical: It distributes the data and processing across clusters
                of commonly available computers. These clusters can number
                into the thousands of nodes.
  * Efficient: By distributing the data, Hadoop can process it in parallel
               on the nodes where the data is located. This makes it
               extremely rapid.
  * Reliable: Hadoop automatically maintains multiple copies of data and
              automatically redeploys computing tasks based on failures.
 .
 This package provides the API documentation of Hadoop.

Package: hadoop-tasktrackerd
Section: misc
Architecture: all
Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
Description: platform for processing vast amounts of data - Task Tracker
 Hadoop is a software platform that lets one easily write and
 run applications that process vast amounts of data.
 .
 Here's what makes Hadoop especially useful:
  * Scalable: Hadoop can reliably store and process petabytes.
  * Economical: It distributes the data and processing across clusters
                of commonly available computers. These clusters can number
                into the thousands of nodes.
  * Efficient: By distributing the data, Hadoop can process it in parallel
               on the nodes where the data is located. This makes it
               extremely rapid.
  * Reliable: Hadoop automatically maintains multiple copies of data and
              automatically redeploys computing tasks based on failures.
 .
 The Task Tracker is the Hadoop service that accepts MapReduce tasks and
 computes results. Each node in a Hadoop cluster that should be doing
 computation should run a Task Tracker.

Package: hadoop-jobtrackerd
Section: misc
Architecture: all
Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
Description: platform for processing vast amounts of data - Job Tracker
 Hadoop is a software platform that lets one easily write and
 run applications that process vast amounts of data.
 .
 Here's what makes Hadoop especially useful:
  * Scalable: Hadoop can reliably store and process petabytes.
  * Economical: It distributes the data and processing across clusters
                of commonly available computers. These clusters can number
                into the thousands of nodes.
  * Efficient: By distributing the data, Hadoop can process it in parallel
               on the nodes where the data is located. This makes it
               extremely rapid.
  * Reliable: Hadoop automatically maintains multiple copies of data and
              automatically redeploys computing tasks based on failures.
 .
 The Job Tracker is a central service which is responsible for managing
 the Task Tracker services running on all nodes in an Hadoop Cluster.
 The Job Tracker allocates work to the tasktracker nearest to the data
 with an available work slot.

Package: hadoop-namenoded
Section: misc
Architecture: all
Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
Description: platform for processing vast amounts of data - name node
 Hadoop is a software platform that lets one easily write and
 run applications that process vast amounts of data.
 .
 Here's what makes Hadoop especially useful:
  * Scalable: Hadoop can reliably store and process petabytes.
  * Economical: It distributes the data and processing across clusters
                of commonly available computers. These clusters can number
                into the thousands of nodes.
  * Efficient: By distributing the data, Hadoop can process it in parallel
               on the nodes where the data is located. This makes it
               extremely rapid.
  * Reliable: Hadoop automatically maintains multiple copies of data and
              automatically redeploys computing tasks based on failures.
 .
 The Hadoop Distributed Filesystem (HDFS) requires one unique server, the
 name node, which manages the block locations of files on the file system.

Package: hadoop-secondarynamenoded
Section: misc
Architecture: all
Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
Description: platform for processing vast amounts of data - secondary name node
 Hadoop is a software platform that lets one easily write and
 run applications that process vast amounts of data.
 .
 Here's what makes Hadoop especially useful:
  * Scalable: Hadoop can reliably store and process petabytes.
  * Economical: It distributes the data and processing across clusters
                of commonly available computers. These clusters can number
                into the thousands of nodes.
  * Efficient: By distributing the data, Hadoop can process it in parallel
               on the nodes where the data is located. This makes it
               extremely rapid.
  * Reliable: Hadoop automatically maintains multiple copies of data and
              automatically redeploys computing tasks based on failures.
 .
 The secondary name node is responsible for checkpointing file system images.
 It is _not_ a failover pair for the name node, and may safely be run on the
 same machine.

Package: hadoop-datanoded
Section: misc
Architecture: all
Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
Description: platform for processing vast amounts of data - data node
 Hadoop is a software platform that lets one easily write and
 run applications that process vast amounts of data.
 .
 Here's what makes Hadoop especially useful:
  * Scalable: Hadoop can reliably store and process petabytes.
  * Economical: It distributes the data and processing across clusters
                of commonly available computers. These clusters can number
                into the thousands of nodes.
  * Efficient: By distributing the data, Hadoop can process it in parallel
               on the nodes where the data is located. This makes it
               extremely rapid.
  * Reliable: Hadoop automatically maintains multiple copies of data and
              automatically redeploys computing tasks based on failures.
 .
 The data nodes in the Hadoop Cluster are responsible for serving up
 blocks of data over the network to Hadoop Distributed Filesystem
 (HDFS) clients.

Attachment: signature.asc
Description: Digital signature


Reply to: