Training
Programming
Spark and Machine Learning at Scale

7878 Reviews star_rate star_rate star_rate star_rate star_half

Spark and Machine Learning at Scale

View Full Schedule

$2,495 USD

Course Code WA3290

Duration 4 days

Available Formats Classroom, Virtual

Enter your Email to Download Full Course Details

May 6, 2024 - May 9, 2024 (4 days)

Language	Time
Virtual
English	10:00 AM – 6:00 PM EST
Select delivery method/location (1 options)
Virtual \| 10:00 AM – 6:00 PM EST Virtual \| 10:00 AM – 6:00 PM EST

Enroll: Enroll

Jun 17, 2024 - Jun 20, 2024 (4 days)

Language	Time
Virtual
English	10:00 AM – 6:00 PM EST
Select delivery method/location (1 options)
Virtual \| 10:00 AM – 6:00 PM EST Virtual \| 10:00 AM – 6:00 PM EST

Enroll: Enroll

Aug 5, 2024 - Aug 8, 2024 (4 days)

Language	Time
Virtual
English	10:00 AM – 6:00 PM EST
Select delivery method/location (1 options)
Virtual \| 10:00 AM – 6:00 PM EST Virtual \| 10:00 AM – 6:00 PM EST

Enroll: Enroll

This Spark and Machine Learning training teaches participants how to build, deploy, and maintain powerful data-driven solutions using Spark and its associated technologies. The course begins with an introduction to Spark, its architecture, and how it fits into the Hadoop and Cloud-based ecosystems. Participants will learn to set up Spark environments using DataBricks Cloud, AWS EMR clusters, and SageMaker Studio. In addition, students will learn about Spark's core functionalities, including RDDs, DataFrames, transformations, and actions.

Skills Gained

Work with Spark's machine learning (ML) libraries, focusing on data preprocessing, feature engineering, model training, and evaluation.
Perform stream processing and graph analysis with GraphX and Graphframes
Deploy Spark ML artifacts
Understand machine learning at scale
Implement distributed training, hyperparameter tuning, model selection, and performance optimization for machine learning pipelines

Who Can Benefit

This course targets data scientists, machine learning engineers, big data engineers, and other professionals with experience in data analysis who wish to leverage Spark for scalable machine learning solutions. It is also suitable for those who want to enhance their large-scale data processing and machine learning knowledge.

Prerequisites

Basic understanding of Python programming
Familiarity with data processing and analysis concepts
Familiarity with Python Pandas
Familiarity with basic machine learning concepts and algorithms is recommended

Course Details

Outline

Chapter 1 - Introduction to Spark. Overview of Spark and its Architecture

Big Data and the Analytics Process
What is Big Data?
Volume
Velocity
Variety
Veracity
Too large to fit into memory
Big data and analytic process
Scaling and Distributed Computing
How to Actually Scale?
Bring the Data to the Compute
Bring the Compute to the Data
Introduction to the Spark Platform
History of Spark and Hadoop
Spark vs. Hadoop MapReduce
Supported Languages
Pandas API on Spark
Spark Architecture: Cluster Manager
Standalone cluster manager
Apache Hadoop YARN
Apache Mesos
Spark Architecture: Driver Process
Spark Architecture: Executor Process and Workers
Spark Building Blocks
Spark SQL and the Catalyst

Chapter 2 - Introduction to Spark - Setting up a Spark Environment

Set Up On-Premise Spark Environment (Ubuntu 20.04, Docker)
Set Up DataBricks Community Cloud and Compute Cluster
Set Up EMR Cluster and Attach Notebook

Chapter 3 - Basic Spark Operations and Transformations

Spark Session and Context
Loading Data
Actions and Transformations
More on Actions in Spark
More on Transformations in Spark
Persistence and Caching

Chapter 4 - Introduction to Spark SQL

What is Spark SQL?
Uniform Data Access with Spark SQL
Integration with cloud storage
Using JDBC Sources
Hive Integration
What is a DataFrame?
Creating a DataFrame in PySpark
Commonly Used DataFrame Methods and Properties in PySpark
Grouping and Aggregation in PySpark
The "DataFrame to RDD" Bridge in PySpark
The SQLContext Object
Examples of Spark SQL / DataFrame (PySpark Example)
Converting an RDD to a DataFrame Example
Example of Reading / Writing a JSON File
Performance, Scalability, and Fault-tolerance of Spark SQL

Chapter 5 - Spark's ML libraries - Lecture: Introduction to Spark's ML libraries

Spark MLlib
Algorithms
Classification
Binary Classification
Multi-Class Classification
Multi-Label Classification
Imbalanced Classification
Regression
Linear Regression
Simple Linear Regression
Multiple Linear Regression
Polynomial Regression
Support Vector Regression
Decision Tree Regression
Random Forest Regression
Feature Engineering
TF-IDF - PySpark example
Word2Vec - PySpark example
Count Vectorizer - PySpark example
Feature Transformers of Spark MLlib
Tokenizer - PySpark example
Stopwords Remover
Stopwords Remover - PySpark example
N-gram - PySpark example
Binarizer - PySpark example
Principal Component Analysis
What is PCA used for?
Advantages and disdvantagesof PCA
PCA - PySpark example
String Indexing - PySpark example
Why One-Hot Encoding is used for nominal data?
One-Hot Encoding - PySpark Example
Bucketizer - PySpark example
Standardization and Normalization
Difference between Standardization and Normalization
Standard Scaler
Robust Scaler
Min Max Scaler
Max Abs Scaler
Imputer
Feature Selectors in Spark MLlib
Vector Slicer - PySpark example
Chi-Squared selection - PySpark example
Univariate Feature Selector
Variance Threshold Selector
Locality Sensitive Hashing
Locality Sensitive Hashing in Spark MLlib
LSH Operations
Locality Sensitive Hashing in Spark MLlib
Bucketed Random Projection for Euclidean Distance
MinHash for Jaccard Distance
Pipeline
Transformer
Estimator
Persistence
Introduction to Hyperparameter Tuning
Hyperparameter tuning methods
Random Search
Grid Search
Bayesian Optimisation
Hyperparameter Tuning with Spark

Chapter 6 - Streaming and Graphs

Stream Analytics
Tools for Stream Analytics: Kafka, Storm, Flink, Spark
Timestamps in stream analytics
Windowing Operations

Chapter 7 - Deploying Spark ML Artifacts - Introduction to deploying Spark ML Artifacts

How the Spark system works
What is Deployment?
Spark Deployment Artifacts
Packaging Spark (ML) for Production
Deploy Spark ML to EMR
Deploy Spark (ML) with Sagamaker
Serving and Updating Spark ML Models
Model Versioning with AWS Model Registry

Chapter 8 - Machine learning at Scale - Introduction to Machine Learning at Scale

Introduction to Scalability
Common Reasons for Scaling Up ML Systems
How to Avoid Scaling Infrastructure?
Benefits of ML at Scale
Challenges in ML Scalability
Data Complexities - Challenges
ML System Engineering - Challenges
Integration Risks - Challenges
Collaboration Issues - Challenges

Chapter 9 - Machine learning at Scale - Distributed Training of Machine Learning models

Introduction to Distributed Training
Data Parallelism
Steps of Data Parallelism
Data Parallelism vs. Random Forest
Model Parallelism
Frameworks for Implementing Distributed ML
Introduction to Distributed Training vs. Distributed Inference
Introduction to Training
Introduction to Inference
Key components of Inference
Inference Challenges
Training vs. Inference
Introduction to GPUs
Inference - Hardware
AWS Inferentia Chip vs GPU

Chapter 10 - Machine learning at Scale - Hyperparameter tuning and model selection at scale

Hyperparameter Tuning at Scale
Hyperparameter Tuning Challenges
Distributed Hyperparameter Tuning
Bayesian Optimization
Distributed Hyperparameter Tuning
Spark Based Tools
TensorFlowOnSpark
Advantages of TensorFlowOnSpark
BigDL
Advantages of BigDL
Horovod
Advantages of Horovod
H2O Sparkling Water
Advantages of Sparkling Water over H2O

Lab Exercises

Lab 1. Spark Introduction Lab
Lab 2. Spark Setup Lab
Lab 3. Installing graphframes in DCC

Read Less

View Full Schedule

4 options available

May 6, 2024 - May 9, 2024 (4 days)

Language	Time
Virtual
English	10:00 AM – 6:00 PM EST
Select delivery method/location (1 options)
Virtual \| 10:00 AM – 6:00 PM EST Virtual \| 10:00 AM – 6:00 PM EST

Enroll: Enroll

Jun 17, 2024 - Jun 20, 2024 (4 days)

Language	Time
Virtual
English	10:00 AM – 6:00 PM EST
Select delivery method/location (1 options)
Virtual \| 10:00 AM – 6:00 PM EST Virtual \| 10:00 AM – 6:00 PM EST

Enroll: Enroll

Aug 5, 2024 - Aug 8, 2024 (4 days)

Language	Time
Virtual
English	10:00 AM – 6:00 PM EST
Select delivery method/location (1 options)
Virtual \| 10:00 AM – 6:00 PM EST Virtual \| 10:00 AM – 6:00 PM EST

Enroll: Enroll

Sep 23, 2024 - Sep 26, 2024 (4 days)

Language	Time
Virtual
English	10:00 AM – 6:00 PM EST
Select delivery method/location (1 options)
Virtual \| 10:00 AM – 6:00 PM EST Virtual \| 10:00 AM – 6:00 PM EST

Enroll: Enroll

When does class start/end?

Classes begin promptly at 9:00 am, and typically end at 5:00 pm.

Does the course schedule include a Lunchbreak?

Lunch is normally an hour long and begins at noon. Coffee, tea, hot chocolate and juice are available all day in the kitchen. Fruit, muffins and bagels are served each morning. There are numerous restaurants near each of our centers, and some popular ones are indicated on the Area Map in the Student Welcome Handbooks - these can be picked up in the lobby or requested from one of our ExitCertified staff.

How can someone reach me during class?

If someone should need to contact you while you are in class, please have them call the center telephone number and leave a message with the receptionist.

What languages are used to deliver training?

Most courses are conducted in English, unless otherwise specified. Some courses will have the word "FRENCH" marked in red beside the scheduled date(s) indicating the language of instruction.

What does GTR stand for?

GTR stands for Guaranteed to Run; if you see a course with this status, it means this event is confirmed to run. View our GTR page to see our full list of Guaranteed to Run courses.

How do I find an ExitCertified training location?

We have training locations across the United States and Canada. View a full list of classroom training locations.

Which delivery formats are available?

At ExitCertified we offer training that is Instructor-Led, Online, Virtual and Self-Paced.

Does ExitCertified deliver group training?

Yes, we provide training for groups, individuals and private on sites. View our group training page for more information.

What does vendor-authorized training mean?

As a vendor-authorized training partner, we offer a curriculum that our partners have vetted. We use the same course materials and facilitate the same labs as our vendor-delivered training. These courses are considered the gold standard and, as such, are priced accordingly.

Is the training too basic, or will you go deep into technology?

It depends on your requirements, your role in your company, and your depth of knowledge. The good news about many of our learning paths, you can start from the fundamentals to highly specialized training.

How up-to-date are your courses and support materials?

We continuously work with our vendors to evaluate and refresh course material to reflect the latest training courses and best practices.

Are your instructors seasoned trainers who have deep knowledge of the training topic?

ExitCertified instructors have an average of 27 years of practical IT experience. They have also served as consultants for an average of 15 years. To stay up to date, instructors will at least spend 25 percent of their time learning new emerging technologies and courses.

Do you provide hands-on training and exercises in an actual lab environment?

Lab access is dependent on the vendor and the type of training you sign up for. However, many of our top vendors will provide lab access to students to test and practice. The course description will specify lab access.

Will you customize the training for our company’s specific needs and goals?

We will work with you to identify training needs and areas of growth. We offer a variety of training methods, such as private group training, on-site of your choice, and virtually. We provide courses and certifications that are aligned with your business goals.

How do I get started with certification?

Getting started on a certification pathway depends on your goals and the vendor you choose to get certified in. Many vendors offer entry-level IT certification to advanced IT certification that can boost your career. To get access to certification vouchers and discounts, please contact customerexp@exitcertified.com.

Will I get access to content after I complete a course?

You will get access to the PDF of course books and guides, but access to the recording and slides will depend on the vendor and type of training you receive.

How to request a W9 for ExitCertified LLC?

View our filing status and how to request a W9.

Easy to work with. Learning material pdfs were able to be printed out in color which was very nice to write on.

Tim

ExitCertified

Great class I learned a great deal from the material. There would seem to a large amount that I need to learn about.

William Driver

ExitCertified

Concise and good to follow along. Although it is a lot to take in under a short period of time.

ExitCertified Student

ExitCertified

The class covered the concepts needed for the AWS Cloud Practitioner Certification.

Ruchir

ExitCertified

The tool provided to practice the course teachings is very functional and easy to use.

ExitCertified Student

ExitCertified

Spark and Machine Learning at Scale

Overview

Schedule

FAQ

Reviews

Skills Gained

Who Can Benefit

Prerequisites

Course Details

Outline

When does class start/end?

Does the course schedule include a Lunchbreak?

How can someone reach me during class?

What languages are used to deliver training?

What does GTR stand for?

How do I find an ExitCertified training location?

Which delivery formats are available?

Does ExitCertified deliver group training?

What does vendor-authorized training mean?

Is the training too basic, or will you go deep into technology?

How up-to-date are your courses and support materials?

Are your instructors seasoned trainers who have deep knowledge of the training topic?

Do you provide hands-on training and exercises in an actual lab environment?

Will you customize the training for our company’s specific needs and goals?

How do I get started with certification?

Will I get access to content after I complete a course?

How to request a W9 for ExitCertified LLC?

Drag & Drop a File Here

Alert!

Modal Title

Error!

Default Title

Prompt

Confirm

Login

Spark and Machine Learning at Scale

Upcoming Course Dates

Overview

Schedule

FAQ

Reviews

Skills Gained

Who Can Benefit

Prerequisites

Course Details

Outline

When does class start/end?

Does the course schedule include a Lunchbreak?

How can someone reach me during class?

What languages are used to deliver training?

What does GTR stand for?

How do I find an ExitCertified training location?

Which delivery formats are available?

Does ExitCertified deliver group training?

What does vendor-authorized training mean?

Is the training too basic, or will you go deep into technology?

How up-to-date are your courses and support materials?

Are your instructors seasoned trainers who have deep knowledge of the training topic?

Do you provide hands-on training and exercises in an actual lab environment?

Will you customize the training for our company’s specific needs and goals?

How do I get started with certification?

Will I get access to content after I complete a course?

How to request a W9 for ExitCertified LLC?

Drag & Drop a File Here

Alert!

Modal Title

Error!

Default Title

Prompt

Confirm

Login