So you need to add, replace, or remove nodes? | Apache Cassandra and DataStax Enterprise - DataStax Academy

Some of the basic operations you’ll need to do after you have started DSE on your first node is eventually add more nodes to scale out your cluster, replace nodes if a node one day fails (let’s face it, hardware eventually fails), or possibly look at decommissioning and removing nodes to reduce the size of your cluster or move nodes to another location (Datacenter or cluster).

While adding and removing nodes might sound relatively easy, there are a few things to consider before you make changes to your cluster.  Understanding the process of adding, replacing & removing will help you implement a smoother change to your cluster.  

Note:  In this blog I will not cover all the settings & steps required to add/replace/remove nodes from start to finish.  For detailed settings & steps, please see our documentation starting here

Adding Nodes

When adding nodes the process is slightly different when it comes to adding nodes to an existing datacenter, vs adding nodes to create a new datacenter. The core difference is the auto_bootstrap setting used.  

Single node - when you need to add a single node to an existing DC, set auto_bootstrap=true in the cassandra.yaml.  When auto_bootstrap is set to true and DSE is started on a fresh new node, the node will be added to the cluster, and the other nodes in the cluster will stream (bootstrap) all the required data to the new node. After the bootstrap process is complete, this new node will be fully populated with the data it owns based on the assigned token range (ranges if using vnodes, e.g. num_tokens > 1).  The node’s status will appear as Up & Normal (UN) in nodetool status output.

Adding multiple nodes - when you need to add multiple nodes to an existing DC, follow the same steps for adding a single node. After the 1st node has completed the bootstrap process, then you can add the 2nd node and so on for as many new nodes as you need to add & bootstrap.  When using auto_bootstrap=true, you should only add one node at a time.  A node is not completely added until the bootstrap operation has completed successfully. Note: Don’t attempt to add & bootstrap more than one node at a time.

Adding multiple nodes to a new DC - when you need to add multiple nodes to a new DC, follow the same steps for adding multiple nodes, except set auto_bootstrap=false. Wait 2 minutes after each node has been added, before adding the next additional node.  Repeat these steps until you have added in the required number of new nodes to the DC.  After you have added all the new nodes to the datacenter, you will need to:

  • change the replication factor for your keyspaces to include the new DC

  • Run nodetool rebuild on each new node in the new datacenter, one node at a time. The rebuild process will then take a number of hours to complete.

The complete steps for adding a datacenter can be found here

Replacing Nodes

Eventually you’ll experience some form of node failure, whether it’s hardware such as power supply, memory, CPU, mainboard, disk, or you may experience a software failure, OS fault, OutOfMemory (OOM) situation. When a node goes down, its status will change to Down & Normal (DN) in nodetool status output.   Depending on the type of failure, you may simply be able to restart DSE on the node (assuming it’s a software failure).  If DSE restarts ok and the node rejoins the cluster successfully, any mutations (hints) that the node missed while it was down will be distributed back to the node (HintedHandOff), and everything will be back to normal.

If the failure was due to hardware, you will need to assess how quickly you’ll be able to replace the hardware.  The key word here is replace, and not remove.

If you are quickly able to repair the failed hardware within a couple of hours or less time than what you have set for max_hint_window_in_ms (default is 3hrs), then you can simply restart DSE on the node, and similar to the software failure recovery, any missed writes (hints) that the node missed while it was down, will be distributed back to the node (HintedHandOff), and everything will be back to normal.

If it’s going to take more time for you to repair the hardware, or you want to replace the failed node with new hardware you have available (assuming with new disk),  you’ll want to follow the steps to replace the dead node.  While the steps will look very similar to the steps required to add a new node, the difference here is that you will be specifying the old node’s IP address you are replacing in the DSE startup (-Dcassandra.replace_address=address_of_dead_node).     This tells the cluster that the new server will be replacing the node that is currently down.  The token(s) previously owned by the dead node will be taken over by the new server The node will be added & bootstrapped like a normal new node except that this node simply replaces the existing dead node.   We place the emphasis on the word “replace” since it is the quickest way to recover a node which has gone down. 

Some people assume they must remove the dead node first and then add in a new node.  While that is possible to do, it results in unnecessary shuffling of data between nodes:

  • when a node is removed, the ring rebalances by redistributing the data to other nodes to restore the replica count;

  • adding a new node after that causes another rebalancing and redistribution of data to other replicas.

With the “replace” method, token [ranges] do not get reshuffled around the ring avoiding the over-streaming scenario.

Removing Nodes

There’s really only two reasons why you would want to remove a node:

  1. you no longer need the node and you permanently want to decommission it from the cluster, never to return.  

  2. you need to move the node from one location (datacenter) to another datacenter, or even repurpose it to a completely different cluster.

Before a running node is removed, a node is decommissioned (nodetool decommission)  whereby data on the node is streamed off the node and redistributed to the neighbouring nodes in the cluster.  After the decommission is complete, DSE can be shutdown and the node is effectively removed from the cluster.  

If a node is already down, you can remove the node (without a decommission) whereby the data on the node will not be streamed off the node, and the node will simply be removed from the cluster.

Note:  Before any previously decommissioned/removed node is re-added to any cluster, even the same cluster, you must ensure that the node has absolutely NO PREVIOUS DATA in the data directory, saved_caches, commitlog, and hints directories. Adding nodes previously used in a cluster will merge older data into the cluster and will cause serious issues, especially if the old data contains a different schema. 

If you still find yourself a little unsure how to handle the situation of adding nodes or when a node goes down, don’t hesitate to open a support ticket to discuss the scenario with the DataStax Support Team who will be more than happy to make sure you’re making the right steps to expand or recover successfully and in as short a time as possible.