Cluster Sizing

DSE Version: 6.0




Ooooh that seems interesting, now doesn’t it?  Let’s jump right in!

 I bet that means exactly what you think it means.  Deducing the size of the cluster based on some metadata.  Some things you should think about when trying to figure out the cluster size.  What is your required throughput? Specifically how much data per second. Secondly, growth rate.  How much growth do you anticipate? And finally, latency. How fast do you need the cluster to respond to requests.  Let’s get into each of these a little deeper.

Throughput is measured by data movement per time period.  For example, how many gigabits per second you want to or can achieve.  With Apache Cassandra, these can be different on reads and writes, so consider them independently of each other.  The following things can have an impact on throughput are things like; your operation generators, such as users, how fast operations are generated, how big or small each operation is as well as the mix of operation like the number of reads and writes you have.

Let’s look a real example using our KillrVideo domain which is the video sharing application we use in all our of DataStax Academy curriculum. Let’s start with Write throughput.  Alright, let’s say we have 2 million users and each user comments 5 times a day. We would end up with about 100 comments per second which equals 2 millions times 5 divided by 24 times 60 times 60.  Wait, this is math. I hate math. Oh well, moving on… each comment inserts a row and each row is about 1000 bytes. What we end with is that we are writing 100 KiloBytes per second.

Are you ready for the next example!  Let’s do this. Back in our KillrVideo domain, let’s display user comments which are read operations.  Okay, we still have 2 millions users. No one joined since I finished the last slide. Bummer. Anyway!  They are each looking at 10 video summaries a day. Now what? Ah yes, you end up with 250 queries/second = (2M * 10) / (24 * 60 * 60).  More math. Ugh. Okay let’s keep going. We have 4 comments per video = 4 rows/query​ which is about 1000 bytes per comment.  You end up with reading 1 MegaByte per second. 250 times 4 times 1000.

Let’s move onto growth rate.  These are the things you need to think about. What size your cluster should be to hold your data.  If you know your throughput, you have to consider your new/update ratio, how your replication factor comes into play and plan for additional headroom required when performing cluster operation, like repair.

Let’s take a look at this example, shall we?  Let’s say you have 100 Kilobytes per second throughput with about 20% write updates.  The result of this is a growth rate of 80 Kilobytes per second, or 1 minus .2 times 100 Kilobytes.

 Let’s throw some replication considerations in there and see what happens.  In this example we have a replication value of 6, 3 replicas in each of two datacenters.  This results in a growth rate of 480 kilobytes per second which is 80 kilobytes times 6.

Okay! One more example, I swear.  Let’s talk about the need to consider headroom when thinking about cluster sizing.  There are cluster operations that require additional headroom at least temporarily while they run.  In this example we are talking about anti-entropy operations with 50% loading which would result in a growth rate of 960 Kilobytes per second which is 480 kilobytes per second divided by .5.  

Let’s put that all together.  Let’s assume you have a node that holds up to 2 Terabytes max using SSD.  This would fill a node each month by using this calculation: (960K * 60 * 60 * 24 * 30) / 2T

Whew!  I think we are done with growth rate.  What’s next? Ah yes! Latency. Maybe I should slow that down.  La-ten-cy. How does this affect cluster sizing? It’s one thing to figure out what your cluster capacity is, it’s another thing entirely ensuring you are adhering to requires SLAs.  Make sure you understand what your SLAs are in terms of both throughput and latency.

What kinds of things can affect your latency?  Well for one, how slow you type. But seriously, the following things have an impact on latency.  IO rate, the shape of your workload, your access patterns, or in other words the queries you plan to run.  Also table width and what your node looks like in terms of memory, size, etc. will also have an impact. Benchmarking is key when it comes to making estimates on cluster sizing.

Now you are armed with the knowledge to get your cluster sized right at least initially.  Remember though that requirements change. You should always keep an eye on your cluster periodically and watch for changes in your environment that indicates a need for growth.  Make sure to plan ahead for any future growth you might anticipate. Cluster sizing isn’t always about getting bigger though. Sometimes a cluster needs to be scaled back to save resources or even money.

No write up.
No Exercises.
No FAQs.

In this unit, we are going to show you some things that you should be thinking about when determining the size of your Apache Casssandra cluster.

Comments are closed.