Zookeeper is a service that can be replicated so that one can have multiple servers running on different machines, essentially having a distributed Zookeeper service. This blog post explains, using a Ruby example, when Zookeeper service is available and when not.
This is the 2nd story on our encounter with Zookeeper. If you have not read the first one, you can read it here.
Having a cluster of multiple Zookeeper servers, a Zookeeper client can connect to anyone of them and start requesting information and issuing commands. But, what does it happen if the server, that the client is connected to, goes down? The client is designed in such a way so that it can pick up the next available server in order to continue with its interaction with the Zookeeper cluster.
Let's see how this is done.
Note: Please, read the first part of these Zookeeper stories in which we explain how you can set up 3 Zookeeper servers to run on your local machine.
Let's start 3 Zookeeper servers on different terminals. We will use the start-foreground
option, so that we can watch their logs:
In Zookeper folder:
$ bin/zkServer.sh start-foreground ~/Documents/zookeeper-3.4.10/conf/1.cfg
... console output will be displayed here ...
Using another terminal window start 2nd server:
In Zookeeper folder:
$ bin/zkServer.sh start-foreground ~/Documents/zookeeper-3.4.10/conf/2.cfg
... console output will be displayed here ...
Using another terminal window start 3rd server:
In Zookeeper folder:
$ bin/zkServer.sh start-foreground ~/Documents/zookeeper-3.4.10/conf/3.cfg
... console output will be displayed here ...
Now, let's write the Ruby client program. Save the following as main.rb
:
In your working folder:
require 'zookeeper'
zookeeper = Zookeeper.new('127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183')
while true
data = zookeeper.get(path: '/test')
puts data.inspect
sleep(5)
end
This is a very simple client that will be fetching data for the znode /test
, repeatedly, sleeping for 5 seconds between
consecutive fetches. On line 3, we tell this client that we have a cluster of 3 Zookeeper services. The idea here is that the client will
pick up one and set up a connection with it, in order to do the fetches. It will be continuously connected to this
server, as long as this server is alive. However, if this server fails for some reason, it will automatically switch to the
next available server.
Note: We assume that you have a znode created with the name
/test
. If you don't have that, read the first part of these Zookeeper stories, in order to learn how you can do it.
Now start the Ruby program and watch the logs of the 3 servers.
Note: I am assuming that you have a
Gemfile
created in your project folder with the following content:source :rubygems gem 'zookeeper'
In your working folder:
$ bundle exec ruby main.rb
{:req_id=>0, :rc=>0, :data=>"my_data", :stat=>#<Zookeeper::Stat:0x007fcfb39afa80 @exists=true, @czxid=8589934594, @mzxid=8589934594, @ctime=1504002919154, @mtime=1504002919154, @version=0, @cversion=1, @aversion=0, @ephemeralOwner=0, @dataLength=7, @numChildren=1, @pzxid=8589934597>}
{:req_id=>1, :rc=>0, :data=>"my_data", :stat=>#<Zookeeper::Stat:0x007fcfb3c77560 @exists=true, @czxid=8589934594, @mzxid=8589934594, @ctime=1504002919154, @mtime=1504002919154, @version=0, @cversion=1, @aversion=0, @ephemeralOwner=0, @dataLength=7, @numChildren=1, @pzxid=8589934597>}
... more output here ... every 5 seconds ...
On one of your servers you will see something like this:
... (more output here) ...
... (more output here) ... Accepted socket connection from /127.0.0.1:50674
... (more output here) ... Connection request from old client /127.0.0.1:50674; will be dropped if server is in r-o mode
... (more output here) ... Client attempting to establish new session at /127.0.0.1:50674
... (more output here) ... INFO [CommitProcessor:2:ZooKeeperServer@687] - Established session 0x25e562c9e7a0000 with negotiated timeout 20001 for client /127.0.0.1:50674
This will be the server that would have accepted the connection from your client. Note that this is not necessarily the first server.
Watch the id of the established session (look at line 4 in the log above). In my case this was 0x25e562c9e7a0000
. This is the unique session id
and you will notice that it will persist even if we switch to another server.
Now, go ahead and use Ctrl + C to stop the server that received the client connection. When you do that, you will see another server picking up, and your Ruby client will continue fetching data.
This is what you will see on another server:
... (more output here) ...
... (more output here) ... Accepted socket connection from /127.0.0.1:50708
... (more output here) ... Connection request from old client /127.0.0.1:50708; will be dropped if server is in r-o mode
... (more output here) ... Client attempting to renew session 0x25e562c9e7a0000 at /127.0.0.1:50708
... (more output here) ... Established session 0x25e562c9e7a0000 with negotiated timeout 20001 for client /127.0.0.1:50708
That is a long output. But the last 4 lines show that the Client attempting to renew session 0x25e562c9e7a0000
. This is very clear.
Our client does not give up when the original server fails. It automatically tries to connect to another server and renew the session.
Look at the session id displayed now: 0x25e562c9e7a0000
. It is the same session id that was printed by the server the client was
originally connected to.
The above experiment proves that:
Now, go ahead and kill the next Zookeeper server, the one that our client is talking to at the moment. What will happen? Out of the 3 initial Zookeeper servers, we are going to have only 1. How will client behave?
Unfortunately, this is a situation in which Zookeeper cluster will not work and client will get a session time out exception, and, stop sending requests to fetch data. It will terminate.
What has happened here is that Zookeeper cluster is not working because there are not enough servers to make sure that we have proper replication and Leader election. Number of live servers is less than the majority/quorum needed in order for the cluster to work.
Now, go ahead and stop the last server too.
You can repeat the previous experiment but with 5 Zookeeper servers in the cluster. You will see that Zookeeper fails to serve client requests when you are left with 2 servers.
Although you may have 2 Zookeeper servers running (out of 5 initially in the cluster), the communication between the client and the Zookeeper cluster fails to be reestablished. This is, again, because the number of Zookeeper servers running are less than the majority of the servers being alive. The number of the servers being alive and running, it needs to be greater than or equal to the majority (quorum) of the servers in the cluster. Otherwise, Zookeeper cluster does not serve requests.
In more formal terms, the quorum is calculated by the formula ceil(N/2)
, where N
is a number of servers in a cluster (or ensemble in Zookeeper nomenclature).
For a 3 server ensemble, that means 2 servers must be up at any time, for a 5 server ensemble, 3 servers need to be up at any time.
Note: The quorum is necessary in order for the Leader to be selected. We will talk about the Leader in a following story.
In this blog post:
Thank you for reading this blog post. Your comments below are more than welcome. I am willing to answer any questions that you may have and give you feedback on any comments that you may post. Don't forget that, I learn from you as much as you learn from me.
Panayotis Matsinopoulos works as Development Lead at Simply Business and, on his free time, enjoys giving and taking classes about Web development at Tech Career Booster.
Want to know more about what it's like to work in tech at Simply Business? Read about our approach to tech, then check out our current vacancies.
Find out moreWe create this content for general information purposes and it should not be taken as advice. Always take professional advice. Read our full disclaimer
Keep up to date with Simply Business. Subscribe to our monthly newsletter and follow us on social media.
Subscribe to our newsletter6th Floor99 Gresham StreetLondonEC2V 7NG
Sol House29 St Katherine's StreetNorthamptonNN1 2QZ
© Copyright 2023 Simply Business. All Rights Reserved. Simply Business is a trading name of Xbridge Limited which is authorised and regulated by the Financial Conduct Authority (Financial Services Registration No: 313348). Xbridge Limited (No: 3967717) has its registered office at 6th Floor, 99 Gresham Street, London, EC2V 7NG.