If you are looking for a blog post that has happy ending, then this is not for you. Not only is there no happy ending, there is no happy beginning and little if no happy events in the middle. If you are looking for a cheerful article on how to tune DSE so it is a shining example of perfection, you should go find another article on another website.

It was a Saturday morning like any other. Keola’s Keurig was cheerfully gurgling along as it spit out the last few drops of less than perfectly heated water into her reusable Starbucks cup. Just a touch of half and half swirled about like a miniature galaxy far, far away. Today was the day that she told her two boys that they would finally get to go to LEGOLAND. It was supposed to be a day of ecstatic screaming, tantrums, laughter, and chaos. But for now, the boys were still sleeping. Keola sat, with coffee in hand, soaking in the incredible comfort of silence in the morning.

And then it happened…

Shrieking out of the silence came the horrible screaming of a klaxon at full volume. A klaxon, as you are probably already aware, here refers to a warning bell often used on submarines as a warning device. The reference here is meant to infer that it is an unpleasant sound. It was the PagerDuty app on her phone.

“Hmm,” Keola thought, “Challenge Accepted.”

Keola cleared the alert from PagerDuty. She pondered as to why she was getting paged. She was not the primary engineer to cover incidents today. That role belonged to her co-worker, Dunstan.

“What do we have today, Dunstan?” she thought to herself.

She decided to give Dunstan a call to figure out what was going on. The phone rang. And it rang. It rang for what felt like an eternity. He wasn't answering his phone. Perhaps this is actually something of concern. Suddenly a voice broke through on the other end.

“Hey, Keola”

It was Dunstan. There was a nervous tension in his voice when he answered.

“Hi Dunstan. I just got paged from PagerDuty. Anything that I can help out with?”

Dunstan breathed heavily for a moment, then paused. He finally broke his silence, speaking in a sheepish manner. “Yeah. I am… I… Uh, well I’m sorry that I didn’t clear out the alert before it rolled over to you, but… well, you see, we’ve had a problem.”

The hair on the back of Keola’s neck began to stand up. “What kind of problem?”

Dunstan began to spin a tale of events to Keola. The cluster had been running smoothly. Dunstan noticed a slight spike in activity in one datacenter on the OpsCenter UI, so he had logged into his workstation remotely. Through a series of unfortunate events, he mistakenly executed a script designed to decommission all of the nodes within a data center. He caught it at the halfway point, but the damage was done.

Keola’s heart sank. This was real. She kept Dunstan on the phone, grabbed her laptop, and logged into the system.

Half of the nodes were down in the US-SD data center…

https://lh5.googleusercontent.com/QUs9w5nVYQ1XJfOF63xpzixoKacgH28uVO7yemqFU3UEudW25mHzjWC68Ckq4K83CA1qkl7YsvppdfPcDeonSqtrwwRs8qzGdSG6QuZOZlt2wdmDIaIYMPtuD-AkMJL6uQqPmrP-

The San Diego datacenter had a total of 60 nodes, and 32 of them were gone. Fortunately, this was not their active datacenter, it was used for failover, so things were still running smoothly for the end users. But this was bad. If a network outage or act of God were to render the active data center useless, this is where the failover would roll to.

“OK, so you decommissioned half of San Diego. This is not good. It’s not all of the nodes, so that’s not bad. But this is not good.”

“I know, I am so sorry. I was just getting ready to re-bootstrap all of the nodes.”

There should be no need to re-bootstrap Keola thought. Let’s just start them back up. She offered the suggestion to Dunstan.

“I’ll get on that right now” Dunstan said with a hesitant confidence. Dunstan proceeded to restart the first node. Only moments in his confidence sank. His voice cracked as he relayed the information over the phone to Keola.

“It’s not letting me do it. It’s failing. When I check the log, I’m getting an error about trying to re-bootstrap a decommissioned node”

This node was decommissioned and will not rejoin the ring unless cassandra.override_decommission=true has been set, or all existing data is removed and the node is bootstrapped again
Fatal configuration error; unable to start server.  See log for stacktrace.
ERROR [main] 2017-09-18 22:59:08,153  CassandraDaemon.java:705 - Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: This node was decommissioned and will not rejoin the ring unless cassandra.override_decommission=true has been set, or all existing data is removed and the node is bootstrapped again
…

Keola smiled with confidence. “Ah, that’s right. We just upgraded to from DSE 4.8 to 5.1, didn’t we? My understanding is that there is a check in place to prevent accidental adding of nodes that were previously decommissioned. We should be able to override that though. There is a flag that we can add in. It should be right there.”

Dunstan interrupted. “I think I found what you are talking about, it’s in the log message. Set override_decommission to true”

Keola instructed Dunstan to add the following line to the cassandra-env.sh file:

JVM_OPTS="$JVM_OPTS -Dcassandra.override_decommission=true"

Dunstan used a script that one of his co-workers created that would modify the file on multiple machines at once. He fed it the list of IPs, and fired it off. The stdout showed that the line was successfully added to the file. But then another line

system keyspace information removed                                    
Commit Log information removed  

 No, No, No, No! It was the wrong script! Dunstan killed the script as soon as quickly as he could. Fortunately, he was able to stop it before it ran on the last 2 nodes. He used ssh to log in to those last 2 nodes and manually added the lines to the cassandra-env.sh file and started the nodes. Success!

"I got 2 nodes up and running, but unfortunately I ran the wrong script when trying to make the config change. We just lost the node information. We need to just re-bootstrap.” Dunstan was melancholy to say the least.

“Hang on,” Keola urged. “How do you keep running the wrong scripts? Are you executing the modified Rightscale scripts from the right_scripts directory?"

Dunstan did a quick double check.

$ pwd
/home/dunstan/scripts/wrong_scripts

"That would be a big fat no. Why would we even have a directory of wrong scripts anyways?" Dunstan inquired. Surely there had to be a reason for some rancorous person to create a directory of scripts that do nothing but harm. The meaning of rancorous here is to indicate that the person is full of bitterness or anger, it is not a reference to the Rancor that was kept as a pet in Jabba the Hut's palace on Tattooine.

Keola wanted to explain it to Dunstan, but she was worried it could only cause him to be filled with anger. Nevertheless, she felt the need to share.

"Well Dunstan, the wrong_scripts directory is merely an illustrative tool used to move the narrative along as needed. We need to get through a series of over the top mistakes to put ourselves in a position of peril that we can then figure out a workaround for. These are over the top, unrealistic examples, but they are necessary for the audience to understand why we are doing what we are doing."

Dunstan shrieked. "Audience! What Audience?"

"We are characters in a story," she said in as soothing a tone as possible. "We just broke the fourth wall."

"OK, back to work. Let’s not make a hasty move on top of our mistake. Did you wipe all of the data on the nodes that had the wrong script ran?”

Dunstan paused, and looked over his script. “The script doesn’t wipe any user keyspace data, so it should all still be there. Let me look.” He listed out the contents of the data directories. All of the data was still there.

Keola checked the free space:

506-Thu Aug 03 08:56
-->$ df -h
Filesystem      Size   Used  Avail Capacity iused      ifree %iused  Mounted on
/dev/disk1     2857Gi  2371Gi   485Gi    83% 2171840 4292795439    0%   /
devfs          183Ki  183Ki    0Bi   100%     632          0  100%   /dev
map -hosts       0Bi    0Bi    0Bi   100%       0          0  100%   /net
map auto_home    0Bi    0Bi    0Bi   100%       0          0  100%   /home
keola@ecorp
507-Thu Aug 03 08:56
-->$

Crap, she thought. This was the datacenter that they were about to expand to free up resources on each of the nodes. “Don’t bootstrap again, Dunstan. These nodes are already at 83% capacity, if we just run a bootstrap, we will run out of space halfway through the process and the nodes will go down.”

“Okay, I’ll wipe them of the data first before I bootstrap them back in.”, Dunstan replied. “With our network as it is right now, we should hopefully get all of the data streamed to all of the nodes in a few days… as long as the network cooperates. Otherwise, it may be longer. 30 nodes will take some time.”

That was not what Keola wanted to hear. If this datacenter was not functional for a full 24 hours, there would be hell to pay. Then a thought occurred to Keola. Could it be that simple? The data was still there. The nodes had only been killed less than an hour ago. The amount of changed data in that time should be fairly minimal. It was worth a shot, especially since it all made sense logically.

“OK, Dunstan. I have an idea. Here is what you are going to do:”

Keola proceeded to list out the steps to run on a node-by-node basis.

Since the data was still on the nodes, wiping and then bootstrapping would just waste time and energy re-streaming what was just on disk. Instead, why not leave the data in place and add the nodes back as they are? She ran the idea past Dunstan. He lacked her enthusiasm on the idea.

“But we use vnodes. How are we supposed to get the data back in the right spots when it’s a random selection of tokens?”, he asked.

Keola smiled to herself. “I think it should be pretty easy, we can walk each other through it. First, we get the old tokens from the node. I’ll go to the logging directory.”

$ cd /var/log/cassandra/

 “If any of the nodes are configured differently for logging, just go there instead. Next, I’ll grep for tokens from the output.log file.

keola@ecorp:/var/log/cassandra$ grep tokens output.log 
INFO  [main] 2017-08-03 17:10:16,498  Config.java:472 - Node
<…>
write_request_timeout_in_ms=2000]
INFO  [main] 2017-08-03 17:10:22,997  BootStrapper.java:234 - Generated random tokens. tokens are [5422636896530366812, 4103220956135545131, -2460197599942441048,
<…>
827552776520422692, 9129810946913651636, 4474097961387496786, -2210310737306466245, -2121684720519817848, -5061245349549466855, 3572774888127252977, -6921396761867507188, -3413040197558888945, -3868082019010980981, -1371684396921471537, -9086799514880988489, -326594930332734914]
INFO  [main] 2017-08-03 17:10:23,747  MigrationManager.java:171 - Create new Keyspace: <…>
replication_factor=1}}, tables=[dse_security.digest_tokens, dse_security.role_options, dse_security.spark_security], views=[], functions=[], types=[]}
INFO  [MigrationStage:1] 2017-08-03 17:10:23,847  ColumnFamilyStore.java:418 - Initializing dse_security.digest_tokens
INFO  [MigrationStage:1] 2017-08-03 17:10:23,852  SSTableReader.java:584 - openAll time for table digest_tokens using 2 threads: 0.033ms

 “OK, so let me narrow that down a bit. I’ll try Bootstrapper.java:234 instead.

$ grep BootStrapper.java:234 output.log 
INFO  [main] 2017-08-03 17:10:22,997  BootStrapper.java:234 - Generated random tokens. tokens are [5422636896530366812, 4103220956135545131, -2460197599942441048,
<…>
827552776520422692, 9129810946913651636, 4474097961387496786, -2210310737306466245, -2121684720519817848, -5061245349549466855, 3572774888127252977, -6921396761867507188, -3413040197558888945, -3868082019010980981, -1371684396921471537, -9086799514880988489, -326594930332734914]

 “So, what we want to do is take those tokens and add them to the cassandra.yaml file as values for the initial_token parameter.”

$ tokens=`grep BootStrapper.java:234 output.log | cut -d[ -f3 | awk '{sub(/.$/,"")}1' ` && sudo sed -i "s/# initial_token:/initial_token: $tokens/g" cassandra.yaml

 “Or we can copy and paste,” Dunstan muttered under his breath.

Keola, continued. “For good measure, I will comment out the num_tokens parameter for vnodes as well.”

# num_tokens: 128

 “Lastly, to prevent the node from trying to get its data from the other nodes in the cluster, I will add a parameter to turn off the bootstrapping mechanism”

auto_bootstrap: false

 “The YAML file should look similar to what I have here:”

# Cassandra storage config YAML

auto_bootstrap: false

# NOTE:
#   See http://wiki.apache.org/cassandra/StorageConfiguration for
#   full explanations of configuration directives
# /NOTE

# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: blog

# This defines the number of tokens randomly assigned to this node on the ring
# The more tokens, relative to other nodes, the larger the proportion of data
# that this node will store. You probably want all nodes to have the same number
# of tokens assuming they have equal hardware capability.
#
# If you leave this unspecified, Cassandra will use the default of 1 token for legacy compatibility,
# and will use the initial_token as described below.
#
# Specifying initial_token will override this setting on the node's initial start,
# on subsequent starts, this setting will apply even if initial token is set.
#
# If you already have a cluster with 1 token per node, and wish to migrate to 
# multiple tokens per node, see http://wiki.apache.org/cassandra/Operations
# num_tokens: 128

# Triggers automatic allocation of num_tokens tokens for this node. The allocation
# algorithm attempts to choose tokens in a way that optimizes replicated load over
# the nodes in the datacenter for the replication strategy used by the specified
# keyspace.
#
# The load assigned to each node will be close to proportional to its number of
# vnodes.
#
# Only supported with the Murmur3Partitioner.
# allocate_tokens_for_local_replication_factor: 3

# initial_token allows you to specify tokens manually.  While you can use it with
# vnodes (num_tokens > 1, above) -- in which case you should provide a 
# comma-separated list -- it's primarily used when adding nodes to legacy clusters 
# that do not have vnodes enabled.
initial_token: 1756335888490405836, -269849889890098979, 6254686962597222321, -5486860511774627266, 1653491942706629665, 5773688451981661964, 5942656450895962150, 3355287262230140666, -1084187087100786141, -1363524888809293740, -426009909889838589, -2765171034193255529, -7325717317621645666, 3009097463267012964, -8200884355387552484, 1429895752065603536, 7575209219965237997, 983670616120131686, 4643968115436569570, 7710230298545104002, -975941486067470755, 1530519378069905382, 7542647744343402966, 4569992133975137040, -5218071493536189424, -4220399130886565344, 2833665938244609917, -5237872238507756197, -1225976105574278950, 1990085368988688512, 6060729806992776082, 9080504036528505285, 3408313883186077751, -3716821111596921762, 5735844567943404542, 2975760709687588323, 655219711697430282, -231928384446951315, 4831493569082168374, 2313323611459398227, 9062965660434502937, -6602711846233174642, 6669626390691127227, -5284128364978636502, 8966071518870185346, 4391947854279323520, -7705167613506060107, 4401423424042220324, 2215605979559075814, -588735902640995341, -2744459549026673983, 5757586841023829127, 818129981052881431, 3225149774724253139, -4208997526263108765, 6891315666608843670, 7735954054193768420, 4972161225471537666, -9009325105769084502, 8208313726407557939, -1976032249427983083, 5320798240010824131, 5188217354901466846, 8050006963315579725, 8113730014317647798, 3269941282730859066, 486475202082365161, -6690716348006514829, 3746305314005369549, 807023539558427076, -19090868739822178, -6131168048289246935, 4135166324963012016, -7450010157903635424, 7130353306938668742, -4110623300549923334, 9045753748540474704, -5163742012859823175, -3510538691144421054, -3056623817596080678, 3413475098982951656, -5037172983897855510, 857406871321320266, 4833591660793997378, -7486297660371276305, 6726257657565880227, 6071476145879329786, 6343495330145851165, 3101450021358906669, -8434294751719658653, 4591460744181217105, 1155799819217604331, 6469692395861350820, -267082805787083419, -8516868963508322238, -8080684712296563290, 3043148099637358893, -5500898834210114475, 2518457198644970520, -6668707464701268417, 8647557388305606581, -3951507936056475714, 2295512326513330455, -3048643951574995284, -1018144770105860013, -647108431771654014, 6381303456196329036, -8478802260503193827, 1841557296962209283, -5796936156154213401, 9159768169446302884, -6135076760937139809, 8056076539487068605, -5182999974087379033, -7026040896801850797, 3502288488788118150, 3974204619528518771, -4959290805573551776, 1527874701679314385, 1171834212929219065, -5655818799752665743, 7662688140384324565, -850813093018330020, -6297899996485082194, 8483072275715034609, 4654838428741132158, 5652028717727410132, 6527601274945401898

“Oh wait, you’re on the phone. You can’t see that. But you get the idea. At this point, we just start up the node.”

$ sudo service dse start

  “Once the node is up, we can verify that it ‘sees’ all of the data again by checking nodetool. If it’s not picking it up, we can just run a nodetool refresh against each of the keyspaces and tables.”

$ nodetool status
Datacenter: US-SD
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns    Host ID                               Rack
UN  10.200.178.44    74.8 KiB  128          ?       28b1d3a0-e1c5-4101-92ab-1e3047600f55  rack1
...
UN  10.202.165.34  1187.4 GiB  128          ?       fe0d6030-7183-42ac-9f5e-53bb4a94fc71  rack1
$ nodetool refresh first alpha
$ nodetool refresh first beta
…

$ nodetool status
Datacenter: US-SD
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns    Host ID                               Rack
UN  10.200.178.44  1174.8 GiB  128          ?       28b1d3a0-e1c5-4101-92ab-1e3047600f55  rack1
...

Keola smiled with a feeling of satisfaction. “There you go, Dunstan. Repeat that on the remaining nodes. Just do the rest of the local seeds first, and then the rest. For good measure, give it a minute or so between each to make sure everything looks good.”

Dunstan breathed a sigh of relief. “Nice!. I'll run through that, bring the nodes up, and then run repair against them."

Keola was confused. “But the nodes have only been down for about an hour and a half. They should all be up under the 3-hour hint window if all goes well.”

Dunstan quizzed Keola on the merits of her statement. “How are you going to get hints streamed to the node when the node was marked as removed? Where are these hints coming from?”

Silence. Then came the comforting tone of understanding. “Of course, that makes sense. I’m so glad that we are a team. I think we balance each other well.”

“Yep, we learn from each other. See ya, Keola. Sorry again.” Dunstan disconnected from the line and Keola went back to her coffee. Keola closed the lid on her laptop, hung up the phone, and took the last sip of her coffee. She noticed the shuffling of feet getting louder and louder as her children came scurrying out from the hallway like squirrels begging for treats. "MOM!!! We are ready to go to LEGOLAND!" the shouted in unison. They clearly were not; dishelved and half awake, but their eagerness left them unaware. Keola smiled brightly.

Challenge completed. Today is a great day.

 

https://lh4.googleusercontent.com/aWKkLnU3wZ_26uaTPISGR2bMDrbZHm0HkPf8FoIlzjjbFVNgoWQX29x_YRjnHBWlSujIzH3dSTIBt8sNcbV5hFg0ORc83VUE-QgVmaZTK8ApvDm-naaV3F68WliCqHtfjkGup4S6

 

So, this story began with an untruth. There is a happy ending. However, if you strictly read through with the hopes of an ending filled with sadness and despair, I am sure that you are either sad, angry, or upset now. So, as you can clearly see, I did not lie after all. There is no happy ending for you.