Implementing Active/Standby Master Process

Zookeeper has been designed so that we can develop distributed applications easier. For example, we can use it to switch from a failing active process to a standby one, in an active/standby configuration. This is what this blog post is about. It demonstrates the ability to have a process switch from standby to active role, when another active process crashes.

This is the 3rd story on Zookeeper. The previous ones can be found here:

  1. Introduction To Zookeeper
  2. Zookeeper Service Availability

Design and Architecture

Before we delve into actual code let’s see an overview of the architecture of this demo.

Master process

The above is a hypothetical system in which a task producer gives commands to a master. The master distributes the commands to workers. Although we could have had only 1 master process, we want to make master functionality highly available. In order to do that, we bring up multiple processes to play the role of master. However, only 1 master process will be active. The other ones will be standby.

That particular bit of the whole system architecture is the one that we are going to implement here, i.e. the part of the multiple master processes, one being active, whereas, the others being standby.

The rest of the system architecture will be implemented and demonstrated in following Zookeeper stories.

Zookeeper Coordinates Active/Standby Election

In order to implement the active/standby architecture, we will rely on Zookeeper to keep track of the master process. How can we do that? These are the rules of the game:

  1. Assume that the active process owns the active token, whereas the standby does not.
  2. When a process starts, it will request Zookeeper the active token.
  3. If Zookeeper has not given the active token to any other process, it will give the token to the process that is requesting it.
  4. If Zookeeper has given the active token to another process, it will deny the token to the requesting process.
  5. When a process is denied the active token, then it asks Zookeeper to be notified when the active token is free.
  6. When a process is notified that the active token is free, it requests the active token from Zookeeper. It might get it, or it might not get it. This is because in between the notification that active token is free and the request to acquire the token, another process might have grabbed it.
  7. If a process that has the active token crashes, Zookeeper will consider the active token free, and will give it to the next process that will request it.

Zookeeper Properties

The Zookeeper properties that will help us implement this active token concept are:

  1. When Zookeeper is requested to create a node, it will return an error if the node already exists.
  2. Requests to create nodes are serialized and are atomic.
  3. There is a node type which is called ephemeral and which lives as long as the client session lives. It is automatically deleted when the session with the client terminates.
  4. A client can request a watch on an existing node. If the node is deleted, then client is notified.

Implement Active Token

Given the above Zookeeper properties

  1. When a process creates an ephemeral node it will become the active process. We assume that it takes the active token.
  2. Any other process that will try to create the same ephemeral node, it will fail. We assume that this process will be the standby one, without the active token.
  3. The standby process will set a watch on the ephemeral node. Hence, it will be notified when that node will be deleted.
  4. When the active process fails/crashes for any reason, its session with Zookeeper will terminate and Zookeeper will delete the ephemeral node. Hence, any other standby process will be notified.
  5. The standby processes are notified and the first that manages to create the new ephemeral node will become the active node. The others, will ask to be notified again, by installing a new watch on the ephemeral node.

Example Interaction

In the following diagram, you can see how the ephemeral node is being used in order to implement the concept of the active token.


The master active process is the one that initially has created the ephemeral Zookeeper node /master. The standby master process fails to create the node, and installs a watch. When the master active process crashes, Zookeeper notifies the standby process that /master ephemeral node does not exist any more. When standby process receives the notification, it creates the /master ephemeral node and becomes the active master process.

Implementation using Ruby

We have had enough of theory. Let’s write some code.

Note: The demo code can be found in this Github repo (tag: active-stand-by-master-nodes)

Master Process

The master process code is given below:

The process does not actually deal with tasks. This will be implemented in future Zookeeper stories. This version above, all that it does is to be in a infinite loop printing its mode in between 3 second sleeps.

The master process mode will be either active or standby.

  1. The implementation relies on the class MasterApp. Read about it further down in the post.
  2. The process initially connects to the Zookeeper server: master_app.connect_to_zk
  3. Then it tries to register as an active master process: result = master_app.register_as_active
  4. If it manages to register as active (result being true), it goes to the main loop.
  5. If it does not manage to register as active (result being false), it registers a watch to be notified if the active process crashes/fails. Then it goes to the main loop.

Note that our client master process has another thread waiting for notifications from Zookeeper. Hence, while it may be in the while loop, this does not prevent the standby process from being notified by the Zookeeper that ephemeral node has been deleted.



The MasterApp class is the one that has the logic to connect to Zookeeper and register as active or standby.

zookeeper gem

The communication of the master process with the Zookeeper server relies on the zookeeper gem.

Keeping Track of Mode

There is an @mode instance variable that takes the values :not_connected:active and :standby. The process that manages to create the ephemeral node, it will be in mode :active. Otherwise, :standby. When initializing, it is :not_connected.


This is the method that tries to register the process as active.

  1. It tries to create the ephemeral node.
  2. If it succeeds, then it changes its mode to :active and returns true
  3. If it fails and the error code is that the node already exists, then it stays to :standby and returns false

Watch For Failing Active

If the master process does not manage to get the active token, i.e. to create the ephemeral node, then it will attach a watch on the existing ephemeral node, in order to be notified by Zookeeper for any change in that node.

In order for a client process to install a watch, it first creates the callback and then calls the #stat method to install it.

Defining Watch Callback

This is how the watch callback is defined:

The code inside the block, is going to be called / executed when Zookeeper notifies our process. It is given a callback_object, that we then wrap into ZookeeperClientWatcherCallback in order to interpret its content. If the callback_object informs us that the ephemeral node has been deleted, then we call register_as_active. If register_as_active succeeds, then we are now the active process, but if it fails, we call the watch_for_failing_active method again.

Registering Watch

After defining the callback logic, we need to register the watch:

The method #stat tells Zookeeper to return back to us the status of the node given in the path argument. Also, we pass the watcher argument specifying how we want to get notified for any changes in that node.

Creating Ephemeral Nodes

The method that creates the ephemeral node is:

The #create() method called on the Zookeeper client, is also sent the ephemeral: true argument. This is the way to create an ephemeral node.

Interpreting API Result

The MasterApp class relies on ZookeeperClientApiResult class in order to interpret the result/response from Zookeeper.

  1. If the result code (:rc) is equal to Zookeeper::Constants::ZNODEEXISTS, then we know that we have tried to create a node that was already there.
  2. If the result code (:rc) is 0, then no error has been returned and everything succeeded at the Zookeeper side.
  3. The :reg_id key returns the request identifier.

Interpreting Watcher Callback

When Zookeeper calls our master process back, which happens when the ephemeral node is deleted, it wraps information inside an object that exposes two important properties:

  1. the type property, which tells us what type of event has taken place, and
  2. the path property, which tells us which is the node the event has taken place against.

We take advantage of these pieces of information above within the class ZookeeperClientWatcherCallback:

The method #node_deleted?() checks whether the node given is the one that has just been deleted.

Example Run

The file, in the source code of this demo, explains how you can configure and run this demo.

Closing Note

In this blog post, we have learnt

  1. About ephemeral Zookeeper nodes.
  2. About Zookeeper commands being synchronized and atomic.
  3. How to use the zookeeper gem to create ephemeral nodes and how to install watches on them.
  4. How to process callback calls from Zookeeper when an ephemeral node is deleted.
  5. How to create an active/standby process architecture using all the above.

Thank you for reading this blog post. Your comments below are more than welcome. I am willing to answer any questions that you may have and give you feedback on any comments that you may post. Don’t forget that, I learn from you as much as you learn from me.

About the Author

Panayotis Matsinopoulos works as Development Lead at Simply Business and, on his free time, enjoys giving and taking classes about Web development at Tech Career Booster.

Ready to start your career at Simply Business?

Want to know more about what it’s like to work in tech at Simply Business? Read about our approach to tech, then check out our current vacancies.

Panos G. Matsinopoulos

This block is configured using JavaScript. A preview is not available in the editor.