Apache Spark & Scala Online Training Course Content
Introduction to Spark
- Understanding Apache Spark
- Limitation of Hadoop
- Stream processing on Spark
- Spark vs Hadoop
- Installing Spark using binaries
- Python installation
- Development environment tool install
Programming model with Spark
- Understanding programming with Spark
- Overview of Spark RDD
- MapReduce and Join programming with Spark
- Creating RDDs
- Operation and methods on RDD
- Spark library stack
Introduction to Scala
- Understanding to Scala
- Feature and Benefit of Scala
- Classes and Objects
- Basic Types and Operations
- Packages and Imports
- Working with Lists
- Stateful Objects
Working with Spark SQL
- Why Spark SQL?
- Concept on SparkSQL
- DataFrame API
- Aggregation in Spark SQL
- Multi-Datasource joining
- Data Catalogs
Spark Data with Python
- Introduction to Python
- Setting up dataset
- Use cases on Data analysis
- Bar chart
- Pie chart
- Scatter plot
Spark Data with Scala
- User cases with Scala
- Write program with Scala
Working with Spark Stream Processing
- Overview of Spark Stream Processing
- Micro batch data processing
- Windowed data processing
- Spark stream processing
- Pie chart
- Scatter plot
Spark Machine Learning
- Overview of Machine Learning
- Concept on Spark Machine Learning
- Wine classification and prediction
- Spam filtering
- ML algorithm
- Model persistence
Spark Machine Learning using MLlib
- Introduction to MLlib
- Create Vector
- Create matrices
- MLlib with regression
- MLlib Classification
Working Spark Graph Processing
- Understanding graphs
- Concept on Spark GraphX programming
- Spark GraphX library
- Understanding GraphFrames
- Graph optimizations
Optimizations and Performance Tuning
- Overview of Optimization
- Memory Optimizing
- Garbage Collection
- Optimizing level of parallelism
- Serialization to improve performance