RECOMMENDED PATHS
AdministratorLOGIN TO START THIS PATH
VIEW COURSES
0%
DS101: Introduction to Apache Cassandra™
0%
DS201: DataStax Enterprise 6 Foundations of Apache Cassandra™
0%
DS210: DataStax Enterprise 6 Operations with Apache Cassandra™
ArchitectLOGIN TO START THIS PATH
VIEW COURSES
0%
DS101: Introduction to Apache Cassandra™
0%
DS201: DataStax Enterprise 6 Foundations of Apache Cassandra™
0%
DS210: DataStax Enterprise 6 Operations with Apache Cassandra™
0%
DS220: DataStax Enterprise 6 Practical Application Data Modeling with Apache Cassandra™
DeveloperLOGIN TO START THIS PATH
VIEW COURSES
0%
DS101: Introduction to Apache Cassandra™
0%
DS201: DataStax Enterprise 6 Foundations of Apache Cassandra™
0%
DS220: DataStax Enterprise 6 Practical Application Data Modeling with Apache Cassandra™
Graph SpecialistLOGIN TO START THIS PATH
VIEW COURSES
0%
DS101: Introduction to Apache Cassandra™
0%
DS201: DataStax Enterprise 6 Foundations of Apache Cassandra™
0%
DS220: DataStax Enterprise 6 Practical Application Data Modeling with Apache Cassandra™
0%
DS330: DataStax Enterprise 6 Graph
UNIT CONTENTS

Recommended Content

Learning Units
Learning Units
Learning Units
Learning Units
Learning Units
Learning Units
 
 
  Previous Unit: Application Connectivity
Next Unit: Ring  

DSE Version: 6.0

Node

Video

Apache Cassandra is a distrubuted system composed of nodes. In this unit, we learn about how individual nodes function.

Transcript: 

Let’s change it up a bit!  Hi, I am Tanya Gallagher, and I am here to talk to you about the node. Apache Cassandra is a distributed system made up of these little guys called nodes.  Before we get into how they all work together, let’s start by talking about an individual node. This little node is pretty powerful and by itself, it has a lot to do.

A single node runs on a server or in a virtual machine and it runs a Java Virtual Machine or JVM; which is a java process.  This java process is running Apache Cassandra. Apache Cassandra is written in Java.

That node could live in a cloud, inside of an on premise data center, or on a variety of disks.  It’s important to note that we always recommend local storage, or even direct attached storage. What we don’t recommend is running it on a SAN.  Running on a SAN is bad news and will cause Apache Cassandra to roll over dead and make you want to cry. We don’t want to see you cry. As a rule of thumb, if your disk has an ethernet cable, it’s probably the wrong choice for your Apache Cassandra set up.  

So what does the node do?  The node is responsible for the data that that node stores.  All the data stored there is in a distributed hash table. Let’s look at a write--  all the data in Apache Cassandra is hashed, in this case, the key for that is 59. It can read data from it.  In this example, we want to read data with the key 22 from the node. All of those things happen on a single node.  In a distributed environment like Apache Cassandra, there will a lot of nodes that are handling their own chunks of data, but we will get into that in more detail in a different module.

What can that single node handle?  Well, typically about 3000 to 5000 transactions per second per core. Your mileage may vary depending on your hardware and configuration. Those are reads and writes per second per core.  If you have a lot of cores on your server, you can get a lot of transactions going. And how much data are we talking about? About 2-4 Terabytes on SSD or rotational disk. Most everyone has SSDs now anyway, which is great because they are so fast.  If you want more than that on a single server, that takes some playing around with your configuration which is a bit out of the scope of this class and is covered in a different class here on DataStax Academy.

So how can this system be managed?  Well we have a tool for the node aptly named “nodetool”.  Creative, huh? Nodetool is neat because it has some commands that are specific to just the node or some commands that operate on the whole cluster which is cool when you are checking on the health and well being of either that node or the cluster.  For example, “nodetool info” will give you information about the running node by itself, such as JVM statistics, etc. You can also run a “nodetool status” which will give you not just information about the single node but also all the other nodes in the cluster.  This will give you the current state of how this node sees all the other nodes in the cluster.

So this was a quick introduction to the small but mighty node, but this node is a small piece of a much bigger system. That system is the Apache Cassandra cluster which will talk about in a different module.  

No write up.
No Exercises.
No FAQs.
No slides.
No resources.