Configuring Clusters

DSE Version: 6.0


In this unit, you will learn the basics of cluster configuration, with a focus on the incredibly important YAML file!


Let’s start with the basics of cluster configuration.  Hmmmm, where should we start. Ah yes! The yaml file. Yaml.  yaml. I like that word--sounds like a vegetable. Anyhoo, moving on.  The cassandra.yaml file is the main configuration file for Apache Cassandra and can be found in either /etc/dse/cassandra if you used a package installer, or where ever you installed cassandra/resources/cassandra/conf for a tarball installation.  There’s a bunch of things you can configure in that file, and we will be going over some key configuration options in the next few slides. Remember though that you will have to restart the node for the changes to take effect.

Okay, so there are some settings that everyone changes.  Let’s call those ‘quick start’ settings. These are the minimal settings to get Cassandra up and running.  Alrighty, what’s first. The cluster_name. If you don’t change that, the default name will be Test Cluster.  You might want to get more creative than that. Next, listen_address. This is the IP address that is used by other nodes in the cluster so that this node can be found.  Moving on to native_transport_address, which is the IP address used by the client to connect to the node and/or cluster. Finally the -seeds addresses. These are the nodes that are contacted when new nodes are joining the cluster.  If you have more than one, it’s a list. It need some of them double-quotes to enclose the list. Usually all the nodes in the cluster will have the same seed list.

Okay, what’s next?  Ah yes, where do I find these settings?  Well, you can scan the whole file if you’ve got time to kill or you can use the search feature in whatever text editor you fancy.  Open the file in a text editor, search for cluster_name, change it and save the file. voila! You are Golden. Ooooh, one gotcha. This is case-sensitive, so remember the case when you set it.  

Yup, same thing here, search for the listen address, edit the file to change it from the default of local host and then save the file.  Done. Remember to either change the address or the interface….not both. Oooooh, this is easy! What’s next??

Now we are cooking with gas! Search for this setting, change it, save it and forget it.  Well, maybe you don’t want to forget it. But you know what I mean.

Ah, here we might add a small complication.  Double quotes. You guys ready for this? You will search for -seeds and then edit these.  If you have more than one seed address, you have to put the list of them in double quotes separated by commas.  The seed addresses are critical because new nodes connecting to cluster will go to one of these seed nodes to get the lay of the land.

Alright, now that the required configuration settings are, you know, configured, you can move onto some other commonly configured settings.  I feel like i am saying “configure” a lot in like tons of variations.

Alright, so let’s start with endpoint snitch.  This is required if you want your cluster to be topology aware.  Next is initial token and num_token. These are going to come into play if you want to use virtual nodes, otherwise known as vnodes to evenly distributed data in your cluster.  The next for settings are all directories where we will store information, such as commit logs, the actual data, hints, or key and row cache files.

And last but not least, some other cool things you can set, enable, or disable in your yaml settings that you should at least know about!  If you want to enable hinted handoffs, you flip the switch here in this top setting. It’s on by maybe no switch flipping necessary!  Next on my little list here is max hint window in milleseconds. How long hints will be be stored for a dead node. Alas poor node, i knew him well.  The default is 3 hours. What’s next?? Row cache size in megabits. This is the maximum size of the row cache in memory. It’s set to 0 by default. File cache size in megabits is next. This is the maximum memory to use when pooling SSTable buffers. And finally memtable_heap_space_in_mb/memtable_offheap_space_in_mb which is the total on heap and off allowance for memtables

Alright, I think that’s enough for now!  Let’s go see if you learned anyway by jumping onto an exercise!

No write up.
No Exercises.
No FAQs.
No resources.
Comments are closed.