Zookeeper has been designed so that we can develop distributed applications easier. For example, we can use it to switch from a failing active process to a standby one, in an active/standby configuration. This is what this blog post is about. It demonstrates the ability to have a process switch from standby to active role, when another active process crashes.
This is the 3rd story on Zookeeper. The previous ones can be found here:
Before we delve into actual code let's see an overview of the architecture of this demo.
The above is a hypothetical system in which a task producer gives commands to a master. The master distributes the commands to workers. Although we could have had only 1 master process, we want to make master functionality highly available. In order to do that, we bring up multiple processes to play the role of master. However, only 1 master process will be active. The other ones will be standby.
That particular bit of the whole system architecture is the one that we are going to implement here, i.e. the part of the multiple master processes, one being active, whereas, the others being standby.
The rest of the system architecture will be implemented and demonstrated in following Zookeeper stories.
In order to implement the active/standby architecture, we will rely on Zookeeper to keep track of the master process. How can we do that? These are the rules of the game:
The Zookeeper properties that will help us implement this active token concept are:
Given the above Zookeeper properties
In the following diagram, you can see how the ephemeral node is being used in order to implement the concept of the active token.
The master active process is the one that initially has created the ephemeral Zookeeper node /master
. The standby master process
fails to create the node, and installs a watch. When the master active process crashes, Zookeeper notifies the standby process that /master
ephemeral node does not exist any more. When standby process receives the notification, it creates the /master
ephemeral node and
becomes the active master process.
We have had enough of theory. Let's write some code.
Note: The demo code can be found in this Github repo (tag: active-stand-by-master-nodes)
The master process code is given below:
# File: master.rb
#
master_app = MasterApp.new
master_app.connect_to_zk
result = master_app.register_as_active
master_app.watch_for_failing_active unless result
while true
sleep 3
puts "I am #{master_app.mode} master"
end
The process does not actually deal with tasks. This will be implemented in future Zookeeper stories. This version above, all that it does is to be in a infinite loop printing its mode in between 3 second sleeps.
The master process mode will be either active
or standby
.
MasterApp
. Read about it further down in the post.master_app.connect_to_zk
active
master process: result = master_app.register_as_active
active
(result
being true
), it goes to the main loop.active
(result
being false
), it registers a watch to be notified if the active process crashes/fails.
Then it goes to the main loop.Note that our client master process has another thread waiting for notifications from Zookeeper. Hence, while it may be in the while loop, this does not prevent the standby process from being notified by the Zookeeper that ephemeral node has been deleted.
MasterApp
The MasterApp
class is the one that has the logic to connect to Zookeeper and register as active or standby.
# File: master_app.rb
#
require 'zookeeper'
require 'zookeeper_client_configuration'
require 'zookeeper_client_api_result'
require 'zookeeper_client_watcher_callback'
class MasterApp
MASTER_NODE = '/master'
attr_reader :mode
def initialize
self.mode = :not_connected
end
def connect_to_zk
@zookeeper_client = Zookeeper.new(zookeeper_client_configuration.servers)
end
def register_as_active
result = create_ephemeral_node(MASTER_NODE, Process.pid)
if result.no_error?
self.mode = :active
true
elsif result.node_already_exists?
self.mode = :standby
false
else
raise "ERROR: Cannot start master app: #{result.inspect}"
end
end
def watch_for_failing_active
watcher_callback = Zookeeper::Callbacks::WatcherCallback.create do |callback_object|
callback_object = ZookeeperClientWatcherCallback.new(callback_object)
if callback_object.node_deleted?(MASTER_NODE)
result = register_as_active
watch_for_failing_active unless result
end
end
zookeeper_client.stat(path: MASTER_NODE, watcher: watcher_callback)
end
private
attr_reader :zookeeper_client
attr_writer :mode
def zookeeper_client_configuration
ZookeeperClientConfiguration.instance
end
def create_ephemeral_node(node, data)
ZookeeperClientApiResult.new(zookeeper_client.create(path: node, data: data.to_s, ephemeral: true))
end
end
zookeeper
gemThe communication of the master process with the Zookeeper server relies on the zookeeper gem.
There is an @mode
instance variable that takes the values :not_connected
, :active
and :standby
. The process that manages to
create the ephemeral node, it will be in mode :active
. Otherwise, :standby
. When initializing, it is :not_connected
.
MasterApp#register_as_active
This is the method that tries to register the process as active
.
# File: master_app.rb
#
...
def register_as_active
result = create_ephemeral_node(MASTER_NODE, Process.pid)
if result.no_error?
self.mode = :active
true
elsif result.node_already_exists?
self.mode = :standby
false
else
raise "ERROR: Cannot start master app: #{result.inspect}"
end
end
...
:active
and returns true
:standby
and returns false
If the master process does not manage to get the active token, i.e. to create the ephemeral node, then it will attach a watch on the existing ephemeral node, in order to be notified by Zookeeper for any change in that node.
# File: master_app.rb
#
...
def watch_for_failing_active
watcher_callback = Zookeeper::Callbacks::WatcherCallback.create do |callback_object|
callback_object = ZookeeperClientWatcherCallback.new(callback_object)
if callback_object.node_deleted?(MASTER_NODE)
result = register_as_active
watch_for_failing_active unless result
end
end
zookeeper_client.stat(path: MASTER_NODE, watcher: watcher_callback)
end
...
In order for a client process to install a watch, it first creates the callback and then calls the #stat
method to
install it.
This is how the watch callback is defined:
# File: master_app.rb
#
...
watcher_callback = Zookeeper::Callbacks::WatcherCallback.create do |callback_object|
callback_object = ZookeeperClientWatcherCallback.new(callback_object)
if callback_object.node_deleted?(MASTER_NODE)
result = register_as_active
watch_for_failing_active unless result
end
end
...
The code inside the block, is going to be called / executed when Zookeeper notifies our process. It is given a callback_object
, that
we then wrap into ZookeeperClientWatcherCallback
in order to interpret its content. If the callback_object
informs us that
the ephemeral node has been deleted, then we call register_as_active
. If register_as_active
succeeds, then we are now the active
process, but if it fails, we call the watch_for_failing_active
method again.
After defining the callback logic, we need to register the watch:
# File: master_app.rb
#
...
zookeeper_client.stat(path: MASTER_NODE, watcher: watcher_callback)
...
The method #stat
tells Zookeeper to return back to us the status of the node given in the path
argument. Also,
we pass the watcher
argument specifying how we want to get notified for any changes in that node.
The method that creates the ephemeral node is:
# File: master_app.rb
#
...
def create_ephemeral_node(node, data)
ZookeeperClientApiResult.new(zookeeper_client.create(path: node, data: data.to_s, ephemeral: true))
end
...
The #create()
method called on the Zookeeper client, is also sent the ephemeral: true
argument. This is the way to
create an ephemeral node.
The MasterApp
class relies on ZookeeperClientApiResult
class in order to interpret the result/response from Zookeeper.
# File: zookeeper_client_api_result.rb
#
class ZookeeperClientApiResult
def initialize(result)
@result = result
end
def request_id
result[:req_id]
end
alias :req_id :request_id
def node_already_exists?
result[:rc] == Zookeeper::Constants::ZNODEEXISTS
end
def no_error?
result[:rc] == 0
end
attr_reader :result
end
:rc
) is equal to Zookeeper::Constants::ZNODEEXISTS
, then we know that we have tried to create a node that was already there.:rc
) is 0
, then no error has been returned and everything succeeded at the Zookeeper side.:reg_id
key returns the request identifier.When Zookeeper calls our master process back, which happens when the ephemeral node is deleted, it wraps information inside an object that exposes two important properties:
type
property, which tells us what type of event has taken place, andpath
property, which tells us which is the node the event has taken place against.We take advantage of these pieces of information above within the class ZookeeperClientWatcherCallback
:
# File: zookeeper_client_watcher_callback.rb
#
class ZookeeperClientWatcherCallback
def initialize(callback_object)
@callback_object = callback_object
end
def node_deleted?(node)
callback_object.type == Zookeeper::Constants::ZOO_DELETED_EVENT && callback_object.path == node
end
private
attr_reader :callback_object
end
The method #node_deleted?()
checks whether the node given is the one that has just been deleted.
The README.md file, in the source code of this demo, explains how you can configure and run this demo.
In this blog post, we have learnt
zookeeper gem
to create ephemeral nodes and how to install watches on them.Thank you for reading this blog post. Your comments below are more than welcome. I am willing to answer any questions that you may have and give you feedback on any comments that you may post. Don't forget that, I learn from you as much as you learn from me.
Panayotis Matsinopoulos works as Development Lead at Simply Business and, on his free time, enjoys giving and taking classes about Web development at Tech Career Booster.
Want to know more about what it's like to work in tech at Simply Business? Read about our approach to tech, then check out our current vacancies.
Find out moreWe create this content for general information purposes and it should not be taken as advice. Always take professional advice. Read our full disclaimer
Keep up to date with Simply Business. Subscribe to our monthly newsletter and follow us on social media.
Subscribe to our newsletter6th Floor99 Gresham StreetLondonEC2V 7NG
Sol House29 St Katherine's StreetNorthamptonNN1 2QZ
© Copyright 2023 Simply Business. All Rights Reserved. Simply Business is a trading name of Xbridge Limited which is authorised and regulated by the Financial Conduct Authority (Financial Services Registration No: 313348). Xbridge Limited (No: 3967717) has its registered office at 6th Floor, 99 Gresham Street, London, EC2V 7NG.