Before Getting Started

To make the most of this course please download the recommended assets.


Training materials :
DS320 Virtual Machine Download (Includes Exercises)
DS320 Course Slides
Log in to Download

About this Course

Course duration: 12 hours

In this course, you will learn how to effectively and efficiently solve analytical problems with Apache Spark™, Apache Cassandra™, and DataStax Enterprise. You will learn about Spark API, Spark-Cassandra Connector, Spark SQL, Spark Streaming, and crucial performance optimization techniques.  You will also learn the basics of the productive and robust Scala programming language for data analysis and processing in Apache Spark™.


Units 1 - 47 of 47

Introduction: Data Analytics
Introduction: Spark Architecture
Introduction: Spark Shell
Introduction: Web UI
Essentials: Hello World
Essentials: Resilient Distributed Datasets
Essentials: Three Ways to Create an RDD
Essentials: RDD Transformations
Essentials: RDD Actions
Connecting Spark: Reading Data From Cassandra
Connecting Spark: Processing Cassandra Data
Connecting Spark: Converting Cassandra Data
Connecting Spark: Saving Data Back to Cassandra
Optimization: Broadcast Variables
Optimization: Accumulator Variables
Optimization: RDD Persistence
Key/Value Pairs: Introduction to Pair RDDs
Key/Value Pairs: Aggregation
Key/Value Pairs: Grouping and Sorting
Key/Value Pairs: Joins
Key/Value Pairs: Set Operations
Tuning Partitioning: Understanding Partitioning
Tuning Partitioning: Partitioning Rules
Tuning Partitioning: Controlling Partitioning
Tuning Partitioning: Data Shuffling
Spark/Cassandra Connector: Count
Spark/Cassandra Connector: Group By Key
Spark/Cassandra Connector: Joining Tables
Spark/Cassandra Connector: Cassandra-Aware Partitioning
Spark Streaming: Discretized Stream
Spark Streaming: Architecture
Spark Streaming: First App
Spark Streaming: Stateless Transformations
Spark Streaming: Stateful Transformations
Spark Streaming: Window Transformation
Spark Streaming: Output Operations
Spark Streaming: Checkpointing and Recovery
Spark Streaming: Persistence
Spark Streaming: Controlling Parallelism
Spark SQL: Spark SQL Basics
Spark SQL: Creating DataFrames
Spark SQL: Accessing DataFrame Schema and Rows
Spark SQL: RDD Operations on DataFrames
Spark SQL: Language-Integrated Queries
Spark SQL: Saving DataFrames to Cassandra
Spark SQL: Querying Cassandra with SQL
Spark SQL: Writing Efficient SQL Queries