The standard description of Apache Spark is that it’s ‘an open source data analytics cluster computing framework’. Another way to define Spark is as a VERY fast in-memory, data-processing framework – like lightning fast. 100x faster than Hadoop fast. As the volume and velocity of data collected from web and mobile apps rapidly increases, it’s critical that the speed of data processing and analysis stay at least a step ahead in order to support today’s Big Data applications and end user expectations.

Spark Streaming is an extension of the core Spark API. Spark Streaming allows us to easily integrate real-time data from disparate event streams (Akka Actors, Kafka, S3 directories, and Twitter for instance) in event-driven, asynchronous, scalable, type-safe and fault tolerant applications.

A quick overview of the main components of Spark and how they work together in a distributed environment.

A description of how DSE uses Apache Cassandra™ and Paxos for Spark Master High Availability

A visual guide to how the Datastax Spark Cassandra Connector writes data from Spark to Cassandra.

This demo will detail the process the Spark Cassandra Connector uses to create a CassandraRDD and how that RDD retrieves data from Cassandra.

This demo shows how you can access data both in Hadoop and Cassandra using Spark in DataStax Enterprise.

This demo shows how to join tables in DataStax Enterprise with Apache Spark. This allows us to create a new table with the top n values.

This demo shows how to create and populate a new index table from an existing table in Cassandra with Apache Spark