DataStax Developer Blog

Get the latest developer news and updates! The DataStax Developer blog is a great resource to keep up to date on the latest!

Subscribe for weekly updates!

Subscribe to RSS feed

E.g., 07/03/2020
E.g., 07/03/2020
May 08, 2019 • By: Jeff Carpenter

After 2+ years of hard work from the community, Apache Cassandra 4.0 is in the final stages of testing for official release. As the biggest gathering of Cassandra professionals since the last major release (3.0), DataStax Accelerate represents a great opportunity to get caught up on what’s been happening in the project - straight from the committers and experts implementing these changes and the power users who are pushing the limits of this highly scalable database even further.

Apr 30, 2019 • By: Brian Hess

This is the third blog post about dsbulk.  The first two blog posts (here and here) covered some basic loading examples.  This post will delve into some of the common options to load, unloading, and counting.

Apr 22, 2019 • By: Cristina Veale, Eric Zietlow

As developers, knowledge is the main thing that holds us back.  The phrase “the more you know” has never been truer. From learning new languages and technologies to better understanding the theory around them, we must expand our knowledge base in order to stay relevant in today's industry.  There are many classes we can take and blogs we can read, but nothing ever comes close to real, hands-on experience working with new technologies. Given the overwhelming demand, bootcamps and hands-on instruction are in short supply.

Apr 17, 2019 • By: Adron Hall

If you’re interested in running data-intensive systems (think Apache Cassandra, DataStax Enterprise, Kafka, Spark, Tensorflow, Elasticsearch, Redis, etc) in Kubernetes this is a great talk. @Lenadroid covers what options are available in Kubernetes, how architectural features around pods, jobs, stateful sets, and replica sets work together to provide distributed systems capabilities. Other features she continues and delves into include custom resource definitions (CRDs), operators, and HELM Charts, which include future and peripheral feature capabilities that can help you host various complex distributed systems. I’ve included references below the video here, enjoy. 

Apr 17, 2019 • By: Amanda Moran

So you want to experiment with Apache Cassandra and Apache Spark to do some Machine Learning, awesome! But there is one downside, you need to create a cluster or ask to borrow someone else's to be able to do your experimentation… but what if I told you there is a way to install everything you need on one node, even on your laptop (if you are using Linux of Mac!). The steps outlined below will install:

  • Apache Cassandra
  • Apache Spark
  • Apache Cassandra - Apache Spark Connector
  • PySpark
  • Jupyter Notebooks
  • Cassandra  Python Driver

Note: With any set of install instructions it will not work in all cases. Each environment is different. Hopefully, this works for you (as it did for me!), but if not use this as a guide. Also, feel free to reach out and add comments on what worked for you!

Apr 09, 2019 • By: Brian Hess

In the last blog post, we introduced the dsbulk command, some basic loading examples, and dove into some mappings.  In this blog post, we are going to look into some additional elements for data loading.

Apr 04, 2019 • By: Jeff Carpenter

As a software industry veteran I’ve s̶e̶e̶n̶ / e̶x̶p̶e̶r̶i̶e̶n̶c̶e̶d̶ / i̶n̶f̶l̶i̶c̶t̶e̶d̶ / been victimized by any number of inventive approaches to integrating and testing distributed systems, so the title of this post is a bit tongue-in-cheek. I’ve been sharing about my experience building a Python implementation of the KillrVideo microservice tier. In the previous posts, I shared how I got started on this project, about building GRPC service stubs and advertising the endpoints in etcd. This time, I’d like to elaborate about why I built this service scaffolding first before implementing any business logic.

Apr 03, 2019 • By: Eric Zietlow

Lightweight transactions are extremely powerful when used correctly.  They not only enable you to use a highly durable distributed system in an ACID-like way, but they also allow you to do it with ease. In this blog, we'll explore lightweight transactions, show how DSE implements them, and call out a few pitfalls to keep in mind.

Mar 26, 2019 • By: Brian Hess

The DataStax Bulk Loader, dsbulk, is a new bulk loading utility introduced in DSE 6.  It solves the task of efficiently loading data into DataStax Enterprise, as well as efficiently unloading data from DSE and counting the data in DSE, all without having to write any custom code or using other components, such as Apache Spark.  In addition to the bulk load and bulk unload use cases, dsbulk aids in migrating data to a new schema and migrating data from other DSE systems or from other data systems. There is a good high-level blog post that discusses the benefits of dsbulk:

  • Easy to use.
  • Able to support common incoming data formats.
  • Able to export data to common outgoing data formats.
  • Able to support multiple field formats, such as dates and times.
  • Able to support all the DSE data types, including user-defined types.
  • Able to support advanced security configurations.
  • Able to gracefully handle badly parsed data and database insertion errors.
  • Able to report on the status and completion of loading tasks, including summary statistics (such as the load rate).
  • Efficient and fast.

Now, I’m a person who learns by example, so what I’m going to do in this series of blog posts is show some of the ways to use dsbulk to do some common tasks.  For the documentation on dsbulk, including all of the parameters and options, see the documentation pages for dsbulk.

Mar 21, 2019 • By: Olivier Michallat

The Java driver team is pleased to announce the general availability of two new major versions: OSS driver 4.0.0 and DSE driver 2.0.0.

These are long-awaited versions that address longstanding issues with the 3.x line:

  • drop the dependency to Guava, and update the API to use Java 8 futures;
  • make the driver more pluggable and better expose the internals;
  • clean up the codebase and make it more modular.