Cloud

DSE Version: 6.0

Video

When deploying DataStax Enterprise, one common platform that many users leverage is the cloud. In this unit, we will be going over some important considerations when using DataStax Enterprise in the cloud.

Transcript: 

When deploying DataStax Enterprise, one common platform that many users leverage is the cloud. Regardless of how much of the cloud you use in your infrastructure or the cloud provider you prefer, DataStax Enterprise makes it straightforward to create clusters using any mix of machines and environments.

In this section, we'll be going over some important considerations when using DataStax Enterprise in the cloud.

To start, one of the important questions when using DataStax Enterprise in the cloud is how exactly do you get DSE installed and running? This can be made as simple as using a point-and-click web interface on Microsoft Azure and Google Cloud to quickly set up a cluster. DataStax also provides and helps to maintain templates and scripts for major cloud providers that can be downloaded and customized for your own use. You can find this on our DataStax Partner Network repos on Github.

Another convenient way to manage provision is through the use of OpsCenter and its Life Cycle Manager to install and configure your DSE cluster. It doesn't manage the actual cloud instance or VM, but once you pass in the appropriate credentials, LCM will take care of the installation and configuration of DataStax Enterprise.

If you feel like you really need to control all of the nitty gritty details of provisioning, installing, and configuring DSE, you'll probably be more interested in using some sort of configuration management or automation tool like Ansible, Chef, Puppet, Terraform, etc, many which already have openly available scripts for DataStax Enterprise.

When it comes to the type of disks you want to use with DataStax Enterprise, the fastest will always be locally attached SSD, like those that you see as ephemeral drives in Amazon EC2. However, there's definitely been a shift to network-based disk devices, some which we'll take a look at in the next slide. If you're still using, or intend to use ephemeral disks, remember that data in those disks do not survive starting and stopping instances. When you find yourself in situations where ephemeral drives fail, it's best to start out fresh with a brand new instance after terminating the existing one. Some signs of impending failure of an ephemeral drive include bit-rot, which can be seen noticeably with corrupted sstables.

With Amazon AWS, their premium EBS disks work pretty well, with faster speeds seen on large disk volume sizes. With the GP2 type, you'll want to start them with at least 3TB.

Google disk drives also work well regardless of the disk type. In fact there have been some cases where customers have found persistent SSDs running faster than local ssd. However given the published specs, don't expect for that to always be the case. Make your own observations and go from there!

Finally with Azure, ephemeral is the commonly used disk type. Premium storage is also available now but there are specific recommendations for those disks types at this point.

When it comes to CPUs in the cloud, just remember that what you think you're getting usually is not. This is especially the case with hyperthreading, making it look like you have more physical cores than you actually have. Noisy neighbors is also a potential problem, considering that you are sharing hardware with many other users, often without any idea how the hardware is shared. Check out the CPU steal time when you can, as often that may be an indication of a noisy neighbor that is affecting your own CPU usage. If high, you can consider terminating your instance and finding a new, and better, neighbor.

In terms of Network, the only real consideration you need to think about for AWS is how to handle network across different regions, since the internal IPs are restricted to the same regions.

With Azure, avoid communications between different virtual networks, since it tends to be slow. If necessary, they are good for only small workloads. Depending on the VM being used, your network bandwidth can also increase, which is nice if you intend to use a hybrid deployment with both Azure and a physical datacenter. Public IPs are typically the way you want your nodes to connect, especially across different regions. VPN gateways aren't recommended as the bandwidth we have seen is not ideal. Express Route is another possible way to connect datacenters in different physical locations, though a lot of it has to do with the topology and where your circuit is located.

Finally with Google Cloud, there's really not much to say here. Since it's a flat network, there's really no additional configuration that's necessary.

In addition, AWS also has the option of using enhanced networking for certain instance types. If supported on the instance types you use, be sure to turn it on to get the benefits of enhanced networking performance, such as increased bandwidth, higher packer per second performance, and lower inter-instance latencies at no additional charge.

Finally for security, AWS has volume encryption available for EBS, which makes it a simple way to encrypt your at-rest data and all associated files in the filesystem. With Google, everything is largely secure by default and access to anything is done through multi-factor authentication.

No write up.
No Exercises.
No FAQs.
No resources.
Comments are closed.