[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

more about Zookeeper



There has been a bit of talk about Apache Zookeeper recently by Thomas
Koch.  He has a bad opinion of
Zookeeper and things it is "not good enough for Debian".

I would like to provide a countervailing view.  Thomas and I get along
very well in person, but there are
definitely somethings that we do not always agree on.  I would not say
that his opinions on Zookeeper
are entirely without merit, but I do feel that they should be taken a
bit in context.  In general, I feel that
more moderate conclusions are to be preferred.

My first point is that stylistics and analysis tools like findBugs are
useful as ways to focus efforts to
improve code quality, but they can hardly be considered the primary
goal.  The primary goal is highly
reliable software that can be maintained and which has a living
community around it.  Zookeeper has
achieved all three of these.  Zookeeper is highly reliable and is used
extensively in high end web
systems and in many other applications.  In my own years of experience
with Zookeeper, I very rarely
have to restart Zookeeper and the issues I have had operationally were
almost exclusively when I
screwed up by allowing a disk to fill up (in which case ZK fails safe
and asks for help) or overloaded
the in-memory datastore (in which case ZK fails safe and asks for
help).  I have found minor bugs
in startup scripts, but have never found any significant errors in
design or implementation in the
java server or java client.  I heard of one really significant error
in the C API which was almost
immediately fixed on discovery.  Moreover, ZK has maintained disk
image and wire compatibility
for years now across a half dozen version changes or more.  In fact,
rolling upgrades with no
downtime have been possible during that time so there are Zookeeper
clusters around with years
of uptime even though none of the machines underneath the cluster has
been up so long.

Suffice it to say that ZK is widely adopted by both sophisticated and
unsophisticated users with
really excellent operational experience.  From the admin point of
view, it is some of the most
reliable and high quality code around.

That leaves the question of internal code quality which is that Thomas
has focussed on in his critiques.

My own recent experience was as lead developer and proposer of the
multi-operation primitive.  This
new capability is one of the largest API changes in years for
Zookeeper.  As Thomas says, this has
taken nearly 6 months to complete.  As Thomas neglects to mention,
however, most of that time was
spent without any significant effort being applied.  I did the API
design, java client changes and wire format
extensions in a week or two late last year and then became too busy to
continue.  Marshall McMullen
recently stepped up along with Camille Fournier to finish off the C
API changes and server side changes.
They (mostly Marshall) were able to make these changes in a few weeks.
 Neither Marshall nor I had
previously been involved much in the internals and were able to make
our changes with only minimal
assistance from the core developers.  Code review has taken a few more
weeks, largely due to schedule
conflict on the part of the core developers rather than difficulty.
IF you are curious, you can see the
entire history of the development at
https://issues.apache.org/jira/browse/ZOOKEEPER-965

The overall patch was moderately large involving changes to 28
separate code files.  It was a fairly invasive
change because it involved changes in functionality on so many separate levels.

So we have some interesting evidence here.

On one side, automated tools quibble with the style of the code in ZK
and Thomas finds the code design
distasteful.

On the other side, two relatively naive coders (with respect to ZK)
were able to make substantial and invasive
changes without negative impact on other functionality.  This is in a
very high performance transactional
storage system.  This is the kind of software that is traditionally
some of the most difficult software to maintain.
Many consider real-time control software and certain high performance
device control applications to be more
difficult.

I won't say that Thomas is wrong.  The code in ZK does need some
curation to make it better, more readable
and simpler.  I am aware of no production code that does not need such help.

In contrast, however, the code as it stands is very well tested, is
extraordinarily stable and by actual demonstration
is maintainable and modifiable by new contributors.

As such, I would disagree strongly with the statement that ZK is not
of sufficient quality to be included in Debian.
Thomas may not want to maintain the Debian packaging, but hopefully
others will.  I hope that Thomas' negative
comments do not discourage others from investigating Zookeeper and
drawing their own conclusions.


(and for disclosure sake, I am on the Zookeeper project management committee)


Reply to: