Introduction to Zookeeper

Zookeeper is an Apache project that was designed and implemented to help other developers build distributed applications. Many of the services that Zookeeper provides are a common set of services that all distributed applications require. Zookeeper implements them for you so that you don’t have to implement them yourself, letting you focus on your application rather than on distributed infrastructure components, like naming, configuration management, synchronization, group services e.t.c.

Note: This is the first story in our encounter with Zookeeper. Following story can be found here.

What is this post about?

This post is a brief introduction to Zookeeper. We will show you how to

  1. install
  2. setup
  3. start
  4. interact

with a Zookeeper server.

Then we will also show you how to interact with the Zookeeper server using Ruby.

Let’s start.

Download and Install Zookeeper

Note: For those who work in OS X and who like to use brew, installing Zookeeper might be as simple as brew install zookeeper.

You can download Zookeeper from any site that is listed in the download page. Pick up a stable release by downloading the file inside the stable folder.

I have downloaded the file zookeeper-3.4.10.tar.gz.

Then, I have unzipped/untarred the file into the folder ~/Documents/zookeeper-3.4.10.

Setup

Before we can start the Zookeeper server, we will have to specify a minimum configuration file. Let’s create the file conf/zoo.cfg inside the folder where you have your Zookeeper installation. This file can be created as a copy of the existing sample file.

In the Zookeeper folder:

$ cp conf/zoo_sample.cfg conf/zoo.cfg

The minimum configuration that will allow you to start 1 Zookeeper server is the following:

tickTime=2000
dataDir=/Users/pmatsino/Documents/zookeeper-data
clientPort=2181

As you can see, I have specified the dataDir to be a directory in my Documents folder. This folder, /Users/pmatsino/Documents/zookeeper-data needs to be present. Go ahead and create this folder before you continue. This directory is going to keep the memory snapshots and the transaction log for the updates of the database. Note that Zookeeper keeps its state in memory in order to be efficient, but it flashes it into snapshots inside the dataDir. Also, the updates are atomic, and the transaction log is kept inside the dataDir too.

The tickTime is given in milliseconds and it is used to be the basic time unit for every Zookeeper configuration key that specifies time. It is used to implement heartbeats and also, it specifies the minimum session timeout, which is going to be twice the tickTime.

The clientPort is the port the Zookeeper server is going to listen to.

Start server

With the above settings in place, it is very easy to start a Zookeeper server:

In the Zookeeper folder:

$ bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /Users/pmatsino/Documents/zookeeper-3.4.10/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
$

Zookeeper data model

Zookeeper data model is based on nodes, which are called znodes. Each node has a unique name which is composed of the path parts to reach that node. Like we do with the file system and the folders and files. The root node is /. Then you can create as many children nodes as you like. And then children of children and so on. Here is another node: /foo/bar. The node bar is a child of node /foo, which in turn is a child of the root node (/).

Besides that, each node may have data attached to it. In fact, a node either has children nodes or data or both. But it cannot be without either of them. However, the data might be an empty string.

Connect to server

Now we can use the client tool that is provided with Zookeeper installation in order to connect to the Zookeeper server. Here it is how:

In the Zookeeper folder:

$ bin/zkCli.sh -server localhost:2181
...
2017-08-29 13:58:40,109 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1299] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x15e2de052c30000, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0]

After lots of lines being printed on your terminal, you reach the [zk: localhost:2181(CONNECTED) 0] which is the prompt that you can use to send commands from the client to the server.

The help command will list the commands that you can use:

[zk: localhost:2181(CONNECTED) 0] help
ZooKeeper -server host:port cmd args
	stat path [watch]
	set path data [version]
	ls path [watch]
	delquota [-n|-b] path
	ls2 path [watch]
	setAcl path acl
	setquota -n|-b val path
	history
	redo cmdno
	printwatches on|off
	delete path [version]
	sync path
	listquota path
	rmr path
	get path [watch]
	create [-s] [-e] path data acl
	addauth scheme auth
	quit
	getAcl path
	close
	connect host:port
[zk: localhost:2181(CONNECTED) 1]

List nodes

Now that we are using the command line interface, let’s list the current nodes:

[zk: localhost:2181(CONNECTED) 1] ls /
[zookeeper]
[zk: localhost:2181(CONNECTED) 2]

As you can see, we have one node, the / in the zookeeper namespace.

Create a Zookeeper node

Let’s create a new node, and then list the nodes again. Note that when we create a new node we give the data to attach to the node. In the following example, "bar" is a piece of string data to attach to node /foo.

[zk: localhost:2181(CONNECTED) 4] create /foo bar
Created /foo
[zk: localhost:2181(CONNECTED) 5] ls /
[zookeeper, foo]

Get details of node

And we can get the details of a node:

[zk: localhost:2181(CONNECTED) 6] get /foo
bar
... ( more output here ) ...
dataLength = 3
numChildren = 0
[zk: localhost:2181(CONNECTED) 7]

Do you see the first line? bar. It is the data associated to the node /foo.

Update details of node

We can update the details of a node with the set command:

[zk: localhost:2181(CONNECTED) 7] set /foo mary
... ( more output here ) ...
dataLength = 4
numChildren = 0
[zk: localhost:2181(CONNECTED) 8] get /foo
mary
... ( more output here ) ...
dataLength = 4
numChildren = 0
[zk: localhost:2181(CONNECTED) 9]

With set /foo mary we update the data for the node /foo to be the string mary. We then confirm with get /foo.

Delete a node

Let’s now delete the /foo node:

[zk: localhost:2181(CONNECTED) 9] delete /foo
[zk: localhost:2181(CONNECTED) 10] ls /
[zookeeper]
[zk: localhost:2181(CONNECTED) 11]

The delete /foo, deletes the /foo node. We then confirm with ls /.

Replicated Zookeeper

Things are getting more interesting, of course, when you have multiple Zookeeper servers replicating your data. Working with 1 server is good while doing development. On the other hand, your production system will have to have more Zookeeper servers working in a replicated configuration.

Let’s see how we can start 3 Zookeeper servers locally.

Stop server

First, let’s stop the server that is running at the moment:

In the Zookeeper folder:

$ bin/zkServer.sh stop
ZooKeeper JMX enabled by default
Using config: /Users/pmatsino/Documents/zookeeper-3.4.10/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
$

Create different data directories

Create 3 different data directories for each one of the Zookeeper servers.

In ~/Documents folder:

$ mkdir zookeeper-data-1
$ mkdir zookeeper-data-2
$ mkdir zookeeper-data-3

Create the myid files

Inside the data directories, you need to create a myid file with the id of each server. Let’s keep it very simple for our demo:

In ~/Documents folder:

$ echo '1' > zookeeper-data-1/myid
$ echo '2' > zookeeper-data-1/myid
$ echo '3' > zookeeper-data-1/myid

The ids of our servers will be 1, 2 and 3 respectively.

Create server configurations

Now, let’s go to our conf folder and create three different configurations. One for each of the servers. We will use these files to start each server accordingly:

conf/1.cfg

The configuration file for the first server, with id 1 (create it inside ~/Documents/zookeeper-3.4.10/conf):

tickTime=2000
dataDir=/Users/pmatsino/Documents/zookeeper-data-1
clientPort=2181
initLimit=10
syncLimit=5
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890

Pay attention to the following. Since we are starting all servers in the same machine:

  1. The dataDir is different per server. Here we specify the dataDir for the server with id 1.
  2. The clientPort is different per server.
  3. The quorum and leader election ports are different for each server. Also, the configuration of a server needs to know the quorum and leader election ports for the other servers too. Here we specify the 2888 and 3888 for the first server with id 1. Then 2889 and 3889 for server with id 2. Finally, 2890 and 3890 for server with id 3.

Having said the above, let’s create the configuration files for the other 2 servers:

conf/2.cfg

The configuration file for the server with id 2 (create it inside ~/Documents/zookeeper-3.4.10/conf):

tickTime=2000
dataDir=/Users/pmatsino/Documents/zookeeper-data-2
clientPort=2182
initLimit=10
syncLimit=5
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890

conf/3.cfg

The configuration file for the server with id 3 (create it inside ~/Documents/zookeeper-3.4.10/conf):

tickTime=2000
dataDir=/Users/pmatsino/Documents/zookeeper-data-3
clientPort=2183
initLimit=10
syncLimit=5
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890

Create start all script

Everything is ready. However, let’s make our life a little bit easier by creating the start_all.sh script inside the folder ~/Documents/zookeeper-3.4.10/bin

#!/usr/bin/env bash

bin/zkServer.sh start ~/Documents/zookeeper-3.4.10/conf/1.cfg
bin/zkServer.sh start ~/Documents/zookeeper-3.4.10/conf/2.cfg
bin/zkServer.sh start ~/Documents/zookeeper-3.4.10/conf/3.cfg

As you can see, we start the Zookeeper server by giving as argument the configuration file they need to use.

Make sure that the script is an executable:

In the Zookeeper folder:

$ chmod +x bin/start_all.sh

Create stop all script

Similarly, let’s create the stop_all.sh script inside the folder ~/Documents/zookeeper-3.4.10/bin:

#!/usr/bin/env bash

bin/zkServer.sh stop ~/Documents/zookeeper-3.4.10/conf/1.cfg
bin/zkServer.sh stop ~/Documents/zookeeper-3.4.10/conf/2.cfg
bin/zkServer.sh stop ~/Documents/zookeeper-3.4.10/conf/3.cfg

Don’t forget to make it executable:

In the Zookeeper folder:

$ chmod +x bin/stop_all.sh

Kick-off servers

Let’s now kick-off our replicated Zookeeper servers:

In the Zookeeper folder:

$ bin/start_all.sh
ZooKeeper JMX enabled by default
Using config: /Users/pmatsino/Documents/zookeeper-3.4.10/conf/1.cfg
Starting zookeeper ... STARTED
ZooKeeper JMX enabled by default
Using config: /Users/pmatsino/Documents/zookeeper-3.4.10/conf/2.cfg
Starting zookeeper ... STARTED
ZooKeeper JMX enabled by default
Using config: /Users/pmatsino/Documents/zookeeper-3.4.10/conf/3.cfg
Starting zookeeper ... STARTED
$

Connect to server 1 - Create data

Now, let’s connect to server 1 and create some data and then quit:

In the Zookeeper folder:

$ bin/zkCli.sh -server 127.0.0.1:2181
...
WatchedEvent state:SyncConnected type:None path:null
[zk: 127.0.0.1:2181(CONNECTED) 0] create /replicated_demo three_servers
Created /replicated_demo
[zk: 127.0.0.1:2181(CONNECTED) 1] quit
$

Connect to server 3 - Confirm data

Now, let’s connect to server 3 and confirm that we have access to the same data:

In the Zookeeper folder:

$ bin/zkCli.sh -server 127.0.0.1:2181
...
WatchedEvent state:SyncConnected type:None path:null
[zk: 127.0.0.1:2183(CONNECTED) 0] ls /
[zookeeper, test, replicated_demo]
[zk: 127.0.0.1:2183(CONNECTED) 1] get /replicated_demo
three_servers
... ( more output here ) ...
dataLength = 13
numChildren = 0
[zk: 127.0.0.1:2183(CONNECTED) 2] quit
$

Bingo! The same data is available via all the servers. Try the second one too. And this is the idea behind the replicated Zookeeper.

Stop all servers

Let’s now stop all servers:

In the Zookeeper folder:

$ bin/stop_all.sh
ZooKeeper JMX enabled by default
Using config: /Users/pmatsino/Documents/zookeeper-3.4.10/conf/1.cfg
Stopping zookeeper ... STOPPED
ZooKeeper JMX enabled by default
Using config: /Users/pmatsino/Documents/zookeeper-3.4.10/conf/2.cfg
Stopping zookeeper ... STOPPED
ZooKeeper JMX enabled by default
Using config: /Users/pmatsino/Documents/zookeeper-3.4.10/conf/3.cfg
Stopping zookeeper ... STOPPED
$

Ruby client bindings

Zookeeper provides client bindings for Java and C. But you can also use zookeeper gem which allows you to access a Zookeeper server using Ruby. Let’s see an example:

Start Zookeeper server

Start again the single instance Zookeeper server:

In the Zookeeper folder:

$ bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /Users/pmatsino/Documents/zookeeper-3.4.10/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
$

Install zookeeper gem

In the Zookeeper folder:

$ gem install zookeeper --no-ri --no-rdoc
Building native extensions.  This could take a while...
Successfully installed zookeeper-1.4.11
1 gem installed
$

Interact with Zookeeper server using Ruby

Start irb and issue commands to Zookeeper server using Ruby:

Create a client object

In the Zookeeper folder:

$ irb
irb(main):001:0> require 'zookeeper'
=> true
irb(main):002:0> zookeeper = Zookeeper.new('127.0.0.1:2181')
=> #<Zookeeper::Client:0x007fc2b9cc3c78 @host="127.0.0.1:2181", @chroot_path="", @req_registry=#<Zookeeper::RequestRegistry...>>>

Get children of root path

irb(main):003:0> zookeeper.get_children(path: '/')
=> {:req_id=>0, :rc=>0, :children=>["zookeeper"], :stat=>#<Zookeeper::Stat:0x007fc2b9ca9788 @exists=true, @czxid=0, @mzxid=0, @ctime=0, @mtime=0, @version=0, @cversion=3, @aversion=0, @ephemeralOwner=0, @dataLength=0, @numChildren=1, @pzxid=22>}
irb(main):004:0>

It is done with the method #get_children, which takes as input the path key. Do you see the result containing :children=>["zookeeper"]?

Create a node

irb(main):004:0> zookeeper.create(path: '/foo', data: 'bar')
=> {:req_id=>1, :rc=>0, :path=>"/foo"}
irb(main):005:0> zookeeper.get_children(path: '/')
=> {:req_id=>8, :rc=>0, :children=>["zookeeper", "foo"], :stat=>#<Zookeeper::Stat:0x007fc2b9bf98b0 @exists=true, @czxid=0, @mzxid=0, @ctime=0, @mtime=0, @version=0, @cversion=6, @aversion=0, @ephemeralOwner=0, @dataLength=0, @numChildren=2, @pzxid=28>}
irb(main):012:0>

Do you see that the children now has value ["zookeeper", "foo"]?

Get node data

irb(main):005:0> zookeeper.get(path: '/foo')
=> {:req_id=>2, :rc=>0, :data=>"bar", :stat=>#<Zookeeper::Stat:0x007fc2b9c82e08 @exists=true, @czxid=25, @mzxid=25, @ctime=1504016747266, @mtime=1504016747266, @version=0, @cversion=0, @aversion=0, @ephemeralOwner=0, @dataLength=3, @numChildren=0, @pzxid=25>}
irb(main):006:0>

Update node data

irb(main):006:0> zookeeper.set(path: '/foo', data: 'mary')
=> {:req_id=>3, :rc=>0, :stat=>#<Zookeeper::Stat:0x007fc2b9c6bca8 @exists=true, @czxid=25, @mzxid=26, @ctime=1504016747266, @mtime=1504016802433, @version=1, @cversion=0, @aversion=0, @ephemeralOwner=0, @dataLength=4, @numChildren=0, @pzxid=25>}
irb(main):007:0> zookeeper.get(path: '/foo')
=> {:req_id=>4, :rc=>0, :data=>"mary", :stat=>#<Zookeeper::Stat:0x007fc2b9c58158 @exists=true, @czxid=25, @mzxid=26, @ctime=1504016747266, @mtime=1504016802433, @version=1, @cversion=0, @aversion=0, @ephemeralOwner=0, @dataLength=4, @numChildren=0, @pzxid=25>}
irb(main):008:0>

Delete node

irb(main):008:0> zookeeper.delete(path: '/foo')
=> {:req_id=>5, :rc=>0}
irb(main):009:0> zookeeper.get_children(path: '/')
=> {:req_id=>6, :rc=>0, :children=>["zookeeper"], :stat=>#<Zookeeper::Stat:0x007fc2b9c325e8 @exists=true, @czxid=0, @mzxid=0, @ctime=0, @mtime=0, @version=0, @cversion=5, @aversion=0, @ephemeralOwner=0, @dataLength=0, @numChildren=1, @pzxid=27>}
irb(main):010:0>

Closing Note

Zookeeper will take a lot of the burden off your back, when designing and developing a distributed application. That was a first introduction to Zookeeper with the very basics of it.

Thank you for reading this blog post. And don’t forget that your comments below are more than welcome. I am willing to answer any questions that you may have and give you feedback on any comments that you may post. I would like to have your feedback because I learn from you as much as you learn from me.

About the Author

Panayotis Matsinopoulos works as Development Lead at Simply Business and, on his free time, enjoys giving and taking classes about Web development at Tech Career Booster.