Apache Spark Online Training

Apache Spark & Scala Online Training Course Content

Introduction to Spark

  • Understanding Apache Spark
  • Limitation of Hadoop
  • Stream processing on Spark
  • Spark vs Hadoop
  • Installing Spark using binaries
  • Python installation
  • Development environment tool install

Programming model with Spark

  • Understanding programming with Spark
  • Overview of Spark RDD
  • MapReduce and Join programming with Spark
  • Creating RDDs
  • Operation and methods on RDD
  • Spark library stack

Introduction to Scala

  • Understanding to Scala
  • Feature and Benefit of Scala
  • Classes and Objects
  • Basic Types and Operations
  • Packages and Imports
  • Working with Lists
  • Stateful Objects

Working with Spark SQL

  • Why Spark SQL?
  • Concept on SparkSQL
  • DataFrame API
  • Aggregation in Spark SQL
  • Multi-Datasource joining
  • Data Catalogs

Spark Data with Python

  • Introduction to Python
  • Setting up dataset
  • Use cases on Data analysis
  • Bar chart
  • Pie chart
  • Scatter plot

Spark Data with Scala

  • User cases with Scala
  • Write program with Scala

Working with Spark Stream Processing

  • Overview of Spark Stream Processing
  • Micro batch data processing
  • Windowed data processing
  • Spark stream processing
  • Pie chart
  • Scatter plot

Spark Machine Learning

  • Overview of Machine Learning
  • Concept on Spark Machine Learning
  • Wine classification and prediction
  • Spam filtering
  • ML algorithm
  • Model persistence

Spark Machine Learning using MLlib

  • Introduction to MLlib
  • Create Vector
  • Create matrices
  • MLlib with regression
  • MLlib Classification

Working Spark Graph Processing

  • Understanding graphs
  • Concept on Spark GraphX programming
  • Spark GraphX library
  • Understanding GraphFrames
  • Graph optimizations

Optimizations and Performance Tuning

  • Overview of Optimization
  • Memory Optimizing
  • Garbage Collection
  • Optimizing level of parallelism
  • Serialization to improve performance