Could you backup please? You're in my (snap)shot! | Apache Cassandra and DataStax Enterprise - DataStax Academy

Backups. They’re the second most important part of your data platform, with the database itself being the most important. They’re also probably the second biggest challenge of operating DSE in an enterprise data platform, with the database itself being the - nevermind, let’s move on.

Right now you’re asking yourself (again, if you’ve been keeping up with our series) “What’s he talking about? I know what’s in a backup! It’s my DATA in there!” Well good! You’re right on both parts - I haven’t gotten to the point of the post yet, and, if your data IS in there, you’re not just doing good, actually, you’re doing great!

I wrote this post because a backup does contain your data, but, in most platforms, it’s not simple to develop and implement a backup plan. As part of recovery planning, the backup plan drives a series of very important decisions and actions about storage, operations, and more. In the course of my work in Datastax Technical Support, I handle a lot of questions about backups, from how to plan them, to what to do if they fail, and everything in between. But, part of the complexity is because a “backup” is more than a single file, in almost every database I’ve supported. It’s usually several separate files that are replayed or directly loaded (i.e. refreshed) into the target, and DSE is no exception. There are 3 ways to capture the state of data in DSE: snapshots, commitlog archives, and incremental sstable backups. Today I’ll explain each briefly, and then discuss how they can be used for recovery plans, including point-in-time restores. Once I’ve covered each, I’ll tie some things together based on my experience, and hopefully this will all result in you having a perfect backup and restore experience next time you have a disaster. I can't vouch for the rest of the disaster, but you'll have this part nailed!

Snapshots

In DSE, snapshots have been the primary means of saving the state of data in a cluster for a long time. You can use nodetool, or a Java app (like OpsCenter, for example) that uses Java Management Extensions (JMX,) to trigger a node to take a snapshot. But before I go on, I want to make sure I clarify the term ‘snapshot’ as I mean it here. In DSE, it’s a hard-link to the sstable files that are in the data file directory at the time it’s created. This means it uses the same inode as the data files when they were originally written. This works in DSE, because sstables are immutable once written, so, the snapshot will retain a filesystem location pointing to the inode for as long as you need it to.

One thing this means is that at the time the snapshot is taken, it takes no extra space on disk - it’s the exact same stored data as the live data files, pointed to by 2 different paths in the filesystem. BUT! It won’t stay that way forever Over time, the size on disk needed to store that snapshot AND the live data set will change, but, not because the snapshot changes - because the live data set does. That’s a nice feature, relative to relational relatives of DSE. Normally a backup makes a copy of the data set, meaning your space need immediately doubles, because now you have 2 copies of the set of data. You can offset this with incrementals, compression, or other techniques but you’re going to be duplicating the data, and the space it takes, regardless of how you manage that result. In DSE, instead, what we see is that the size requirement doesn’t change on the server when we snapshot. The space consumed by many snapshots over time can accumulate, because the live data set will change, and each snapshot will be different, but, you have the luxury in DSE of letting a snapshot provide a nice easy way to move things to remote storage, without doubling space needs on the server up front, and then clear the snapshot once it’s safely archived.

Snapshots provide a few more helpful ways to become part of a backup and recovery plan, as well. For example, if you take a snapshot every ten minutes, and nothing has changed, then, the snapshot takes no more size at all, no matter how many times you re-take it. If only a few files are different in a newer snapshot, then, only those new files add space, since most of the files are the same, and will just get an extra file system reference but not demand more disk sectors. They are easy to understand from a retention perspective, as well. If you took a snapshot at 10 PM every night, on each node, it’s quite easy to determine when the snapshot was taken, and remove it if it’s too old. And since snapshots always fall into the same directory, in their own folder, it’s not too tough to mirror them over to a remote storage site passively, or even sstableload them to a remote cluster. Since they are separated from the live data files logically, copying or removing them won’t require any interaction with the live data files that the DSE node is using. Let’s look at commitlog archiving next.

Commitlog Archives

The commitlog is the transaction stash for a DSE node - when a write request comes in to a node, it’s written to the commitlog as it’s written into the node’s memtables. The memtables will eventually get flushed to disk as an sstable, or, the commit might get lost in memory if the node is restarted before it’s flushed/persisted to disk. The commitlog will replay on a node startup, playing any transactions in the commitlogs to bring the node back to the state it was when it was stopped.

It’s possible to enable commitlog archiving in DSE. What this gains you is a rolled-off copy of each commitlog file - the files themselves are a sort of sequence container, with the transactions being the sequences. When a commitlog file is closed, if you have archiving enabled for the node, you’ll see a copy of the commitlog in a specified archive location. Configuring the archiving process for commitlogs is a topic covered in the documentation, so I won’t cover it here.

At restore time, DSE nodes can be triggered to load commitlog archives and play transactions back from the time of the last snapshot, to bring the node up to the desired restore point. This uses the same files that are used to configure the archival process, modified before the node is restarted to restore the transactions.

At this time, a restore that uses replays of commitlogs DOES require a restart of the node, which is something you will need to include in plans for any point-in-time recovery. In a similar fashion, a node will only replay commitlogs that it generated, and no DSE node can apply commits from another DSE node. However, this replay can be filtered to replay only transactions from a selected list of keyspaces or tables, and is configurable to play forward to a specific millisecond, so, it provides the node the means to rebuild its state from any point you have logs for. Commitlog replay is a slow phase for startup, so, ideally you want to load from sstables as far forward as possible, and then play commitlog archives that only span a short amount of time. To put all of that another way, while it's actually possible to restore a node completely from commitlog replay, it would mean a much longer restore point objective, as each transaction would be written, flushed, and merged, before the node would actually start communicating with any other nodes.

Incremental sstables

Since commitlogs are slow to replay, and snapshots are whole sets of files, we need something to bridge the gap. And, in between the snapshot and the commitlog archive, there’s a neat little stopgap called incremental backups. The word ‘incremental’ gets used even more than in database management so let’s make sure we’re clear on this too - we don’t want your RDBMS knowledge getting in the way of the NoSQL learning you’re here for.

In DSE, an ‘incremental backup’ is a hard-link of an sstable, generated at exactly at the time the sstable is flushed to the live data directory. As writes come in to the commitlog and the memtables, eventually, one of those will trigger a flush. That’s when sstables in the live data directory are created. When you enable incremental backups, it will also make a second reference to any sstables that flush, in the ‘../backups’ folder of the live data directory for any given table. In a brand new node, the data directory and the ‘../backups’ child folder will look the same, and if you took a snapshot during that time, that would also look the same.

But the data directory sstables will merge, and compact away, and space will be reclaimed. The incremental backups won’t, just like with snapshots. These sstables will just sit there. The ‘../backups’ folder will get linked to merged sstables, resulting from compactions, in addition to flushed sstables, and grow more quickly as a result. So, make sure to clean them up, I guess, is the point there.

At restore time, these incremental sstables may be added to the snapshot you're building the restore up from, and loaded in the same way the sstables in snapshots are loaded.

Using this all together

So we've got 3 main ways to store data in case of emergencies. Let’s review the pieces:

  • Snapshots are hard-links of sets of sstables that capture the data state

  • commitlogs take writes, and can be archived and replayed

  • Incremental backups are hard-links of individual sstables as they are made

When you look at those pieces, there’s overlap, if you squint a little.

It’s true...if you’re saving all of these, you’re actually saving the same information, several times over. When I say this a little more formally, I describe these different mechanisms as additive, redundant, and inclusive - they can be used together to build up a complete state capture at any point in time, they cover one another’s possible loss of integrity, and the layers can be used to rebuild the next layer of aggregation, from commitlog up to snapshot.

Why would you save the same data in your backup multiple times though, in the same site? Isn’t that a waste of space, DataStax people? Well - yeah. So, don’t do that. Since these do overlap, and they do complement one another, there are several ways from here to combine them and aggregate them at intervals, then get rid of the small pieces that make up the aggregate.

One pattern that’s quite straightforward is a cycle of regular snapshots, with incremental sstables being captured in between, and commitlog archives captured continuously. At the time the snapshot is cut, it’s probably safe to discard the commitlogs and the incremental sstables saved since the previous snapshot, and then keep the snapshot(s) in your retention window before cleaning it up. The snapshot should contain all the same data, in the same state, as the commitlogs and incremental sstables will. This works fine, and gets you predictable intervals to build a restore from, with a simple restoration approach - take the snapshot from before the restore time you need, add the incremental backups as far forward to the target time as possible, add the commitlog archives into the live commitlog directory, and start or restart the node - the snapshot provides the base, the incrementals provide data flushed since the snapshot but before the restore point, and the commitlogs play in the transactions that aren’t in either of the other 2 sets. It’s simple, and it works.

You might be able to plan something more interesting, though. My favorite pattern is to take a continuous stream of incremental sstables, and commitlog archives, and at restore time, simply add the incremental sstables needed back into the node’s data directory, along with the commitlog archives - no snapshots at all. This worked nicely, and was easy to set up a sync tool (rsync, etc.) to move the incremental sstables to remote storage continuously. I can’t take credit for it, though. I encountered a team using it in the course of my work in Support. Actually, for this post, that pattern also nicely illustrates the fact that snapshots are a collection of sstables, but the same sstables are in the set of incremental backups. The records in all the files are the same, with these different files and their archives holding them in the containers DSE expects them in.

Put the pedal to the metal

I’ll leave you with a few things to look out for, so you don’t hit any tricky common snags. One thing that can cause problems is storing sstables from different nodes in the same folders. Sstables are all named in the same format, with no indication to the node they belong to in the file name. This means copying snapshots from more than one node into the same archive folder will usually result in data loss, as sstables overwrite one another if the table name and generation number are the same. Another thing that can be a bit puzzling is restoring or refreshing data from one node to another, or one cluster to another. The sstables are compatible with any node, but, there’s no guarantee that the partition keys stored in the files are the same partition keys that the node thinks it owns. If it’s the node that created them, and it’s the same topology and tokens, then, it’s probably fine. But, most times, the ranges don’t match the source any more, and you’ll need a way around that; the sstableloader tool with DSE is a great way to avoid that headache. And, it’s worth mentioning that you can simplify things a great deal by using OpsCenter for managing backups and restores. It seems to me that the common perception is that OpsCenter does something different than use the options above, but, it doesn’t. OpsCenter is usually simply working with the backup options I’ve described, and setting them up on your behalf. It does have some extra tasks in the form of retention cleanup and remote destinations, and more, but, for backup and restore, OpsCenter leverages the same tooling you have access to via JMX, the command line, and nodetool or other built-in tools.

Well, that brings me to the end of the post. I hope this was helpful in giving you some different ideas about how to best use DSE tools to get the backup plan you need. I didn’t start out  planning to write a post about “the right way to do backups in DSE,” because really, there’s no “right” or “wrong” plan, really - the right plan is the one that lets you get the cluster and the app (and the users!!) back up and running again! But I did start out to write something that gave a little insight into the tools you have at your disposal with DSE, so you can take it from there! Thanks for reading!