Training
Programming
Advanced Data Analytics with PySpark

7878 Reviews star_rate star_rate star_rate star_rate star_half

Advanced Data Analytics with PySpark

By Request

$1,460 USD

Course Code WA2936

Duration 2 days

Available Formats Classroom

Enter your Email to Download Full Course Details

When you feel constrained by the computing power of a single computer, you can leverage the Apache Spark platform's massively parallel processing capabilities using PySpark, a Python-based language supported by Spark. Along with introducing PySpark, this course covers Spark Shell to interactively explore and manipulate data. Spark SQL is introduced for a uniform programming API to work with structured data. The course ends with covering Pandas for data manipulation and analysis and data visualization with seaborn.

Skills Gained

Learn PySpark Shell Environment
Understand Spark DataFrames
Process Data with the PySpark DataFrame API
Work with Pivot Tables in PySpark
Perform Data Visualization and Exploratory Data Analysis (EDA) in PySpark

Who Can Benefit

Business Analysts who want a scalable platform for solving SQL-centric problem

Prerequisites

Knowledge of SQL, familiarity with Python (or the ability to learn the basics of a new language)

Course Details

Outline

Chapter 1. Introduction to Apache Spark

What is Apache Spark
The Spark Platform
Spark vs Hadoop's MapReduce (MR)
Common Spark Use Cases
Languages Supported by Spark
Running Spark on a Cluster
The Spark Application Architecture
The Driver Process
The Executor and Worker Processes
Spark Shell
Jupyter Notebook Shell Environment
Spark Applications
The spark-submit Tool
The spark-submit Tool Configuration
Interfaces with Data Storage Systems
Project Tungsten
The Resilient Distributed Dataset (RDD)
Datasets and DataFrames
Spark SQL, DataFrames, and Catalyst Optimizer
Spark Machine Learning Library
GraphX
Extending Spark Environment with Custom Modules and Files
Summary

Chapter 2. The Spark Shell

The Spark Shell
The Spark v.2 + Command-Line Shells
The Spark Shell UI
Spark Shell Options
Getting Help
Jupyter Notebook Shell Environment
Example of a Jupyter Notebook Web UI (Databricks Cloud)
The Spark Context (sc) and Spark Session (spark)
Creating a Spark Session Object in Spark Applications
The Shell Spark Context Object (sc)
The Shell Spark Session Object (spark)
Loading Files
Saving Files
Summary

Chapter 3. Introduction to Spark SQL

What is Spark SQL?
Uniform Data Access with Spark SQL
Hive Integration
Hive Interface
Integration with BI Tools
What is a DataFrame?
Creating a DataFrame in PySpark
Commonly Used DataFrame Methods and Properties in PySpark
Grouping and Aggregation in PySpark
The "DataFrame to RDD" Bridge in PySpark
The SQLContext Object
Examples of Spark SQL / DataFrame (PySpark Example)
Converting an RDD to a DataFrame Example
Example of Reading / Writing a JSON File
Using JDBC Sources
JDBC Connection Example
Performance, Scalability, and Fault-tolerance of Spark SQL
Summary

Chapter 4. Practical Introduction to Pandas

What is pandas?
The Series Object
Accessing Values and Indexes in Series
Setting Up Your Own Index
Using the Series Index as a Lookup Key
Can I Pack a Python Dictionary into a Series?
The DataFrame Object
The DataFrame's Value Proposition
Creating a pandas DataFrame
Getting DataFrame Metrics
Accessing DataFrame Columns
Accessing DataFrame Rows
Accessing DataFrame Cells
Using iloc
Using loc
Examples of Using loc
DataFrames are Mutable via Object Reference!
Deleting Rows and Columns
Adding a New Column to a DataFrame
Appending / Concatenating DataFrame and Series Objects
Example of Appending / Concatenating DataFrames
Re-indexing Series and DataFrames
Getting Descriptive Statistics of DataFrame Columns
Getting Descriptive Statistics of DataFrames
Applying a Function
Sorting DataFrames
Reading From CSV Files
Writing to the System Clipboard
Writing to a CSV File
Fine-Tuning the Column Data Types
Changing the Type of a Column
What May Go Wrong with Type Conversion
Summary

Chapter 5. Data Visualization with seaborn in Python

Data Visualization
Data Visualization in Python
Matplotlib
Getting Started with matplotlib
Figures
Saving Figures to a File
Seaborn
Getting Started with seaborn
Histograms and KDE
Plotting Bivariate Distributions
Scatter plots in seaborn
Pair plots in seaborn
Heatmaps
Summary

Chapter 6. (Optional) Quick Introduction to Python for Data Engineers

What is Python?
Additional Documentation
Which version of Python am I running?
Python Dev Tools and REPLs
IPython
Jupyter
Jupyter Operation Modes
Jupyter Common Commands
Anaconda
Python Variables and Basic Syntax
Variable Scopes
PEP8
The Python Programs
Getting Help
Variable Types
Assigning Multiple Values to Multiple Variables
Null (None)
Strings
Finding Index of a Substring
String Splitting
Triple-Delimited String Literals
Raw String Literals
String Formatting and Interpolation
Boolean
Boolean Operators
Numbers
Looking Up the Runtime Type of a Variable
Divisions
Assignment-with-Operation
Comments:
Relational Operators
The if-elif-else Triad
An if-elif-else Example
Conditional Expressions (a.k.a. Ternary Operator)
The While-Break-Continue Triad
The for Loop
try-except-finally
Lists
Main List Methods
Dictionaries
Working with Dictionaries
Sets
Common Set Operations
Set Operations Examples
Finding Unique Elements in a List
Enumerate
Tuples
Unpacking Tuples
Functions
Dealing with Arbitrary Number of Parameters
Keyword Function Parameters
The range Object
Random Numbers
Python Modules
Importing Modules
Installing Modules
Listing Methods in a Module
Creating Your Own Modules
Creating a Runnable Application
List Comprehension
Zipping Lists
Working with Files
Reading and Writing Files
Reading Command-Line Parameters
Accessing Environment Variables
What is Functional Programming (FP)?
Terminology: Higher-Order Functions
Lambda Functions in Python
Example: Lambdas in the Sorted Function
Other Examples of Using Lambdas
Regular Expressions
Using Regular Expressions Examples
Python Data Science-Centric Libraries
Summary

Lab Exercises

Lab 1. Learning the Databricks Community Cloud Lab Environment
Lab 2. Learning PySpark Shell Environment
Lab 3. Understanding Spark DataFrames
Lab 4. Learning the PySpark DataFrame API
Lab 5. Processing Data in PySpark using the DataFrame API (Project)
Lab 6. Working with Pivot Tables in PySpark (Project)
Lab 7. Data Visualization and EDA in PySpark
Lab 8. Data Visualization and EDA in PySpark (Project)

Read Less

0 options available

There are currently no scheduled dates for this course. If you are interested in this course, request a course date with the links above. We can also contact you when the course is scheduled in your area.

Request Other Date Request On-site Course

When does class start/end?

Classes begin promptly at 9:00 am, and typically end at 5:00 pm.

Does the course schedule include a Lunchbreak?

Lunch is normally an hour long and begins at noon. Coffee, tea, hot chocolate and juice are available all day in the kitchen. Fruit, muffins and bagels are served each morning. There are numerous restaurants near each of our centers, and some popular ones are indicated on the Area Map in the Student Welcome Handbooks - these can be picked up in the lobby or requested from one of our ExitCertified staff.

How can someone reach me during class?

If someone should need to contact you while you are in class, please have them call the center telephone number and leave a message with the receptionist.

What languages are used to deliver training?

Most courses are conducted in English, unless otherwise specified. Some courses will have the word "FRENCH" marked in red beside the scheduled date(s) indicating the language of instruction.

What does GTR stand for?

GTR stands for Guaranteed to Run; if you see a course with this status, it means this event is confirmed to run. View our GTR page to see our full list of Guaranteed to Run courses.

How do I find an ExitCertified training location?

We have training locations across the United States and Canada. View a full list of classroom training locations.

Which delivery formats are available?

At ExitCertified we offer training that is Instructor-Led, Online, Virtual and Self-Paced.

Does ExitCertified deliver group training?

Yes, we provide training for groups, individuals and private on sites. View our group training page for more information.

What does vendor-authorized training mean?

As a vendor-authorized training partner, we offer a curriculum that our partners have vetted. We use the same course materials and facilitate the same labs as our vendor-delivered training. These courses are considered the gold standard and, as such, are priced accordingly.

Is the training too basic, or will you go deep into technology?

It depends on your requirements, your role in your company, and your depth of knowledge. The good news about many of our learning paths, you can start from the fundamentals to highly specialized training.

How up-to-date are your courses and support materials?

We continuously work with our vendors to evaluate and refresh course material to reflect the latest training courses and best practices.

Are your instructors seasoned trainers who have deep knowledge of the training topic?

ExitCertified instructors have an average of 27 years of practical IT experience. They have also served as consultants for an average of 15 years. To stay up to date, instructors will at least spend 25 percent of their time learning new emerging technologies and courses.

Do you provide hands-on training and exercises in an actual lab environment?

Lab access is dependent on the vendor and the type of training you sign up for. However, many of our top vendors will provide lab access to students to test and practice. The course description will specify lab access.

Will you customize the training for our company’s specific needs and goals?

We will work with you to identify training needs and areas of growth. We offer a variety of training methods, such as private group training, on-site of your choice, and virtually. We provide courses and certifications that are aligned with your business goals.

How do I get started with certification?

Getting started on a certification pathway depends on your goals and the vendor you choose to get certified in. Many vendors offer entry-level IT certification to advanced IT certification that can boost your career. To get access to certification vouchers and discounts, please contact customerexp@exitcertified.com.

Will I get access to content after I complete a course?

You will get access to the PDF of course books and guides, but access to the recording and slides will depend on the vendor and type of training you receive.

How to request a W9 for ExitCertified LLC?

View our filing status and how to request a W9.

Labs and the study materials provided for Architecting on AWS course are very easy to understand and explains all the topics required to pass the Associate certification.

Sai

ExitCertified

Fantastic and great training. Tons of hands-on labs to really make you understand the material being thought.

ExitCertified Student

ExitCertified

I registered a day before class and am happy that I received all the materials and links in time for the class. Thanks.

ExitCertified Student

ExitCertified

Thank Tech Data for sponsoring this course you really take care of your partners.

ExitCertified Student

ExitCertified

You get detailed labs to guide you through the technical material giving you a hands on method of learning otherwise difficult material.

ExitCertified Student

ExitCertified

Advanced Data Analytics with PySpark

Overview

Schedule

FAQ

Reviews

Skills Gained

Who Can Benefit

Prerequisites

Course Details

Outline

When does class start/end?

Does the course schedule include a Lunchbreak?

How can someone reach me during class?

What languages are used to deliver training?

What does GTR stand for?

How do I find an ExitCertified training location?

Which delivery formats are available?

Does ExitCertified deliver group training?

What does vendor-authorized training mean?

Is the training too basic, or will you go deep into technology?

How up-to-date are your courses and support materials?

Are your instructors seasoned trainers who have deep knowledge of the training topic?

Do you provide hands-on training and exercises in an actual lab environment?

Will you customize the training for our company’s specific needs and goals?

How do I get started with certification?

Will I get access to content after I complete a course?

How to request a W9 for ExitCertified LLC?

Alert!

Modal Title

Error!

Default Title

Prompt

Confirm

Login

Advanced Data Analytics with PySpark

Overview

Schedule

FAQ

Reviews

Skills Gained

Who Can Benefit

Prerequisites

Course Details

Outline

Prerequisites

Introduction to Python 3 Programming (Practical Programming for Beginners)

Upcoming Course Dates

Drag & Drop a File Here

When does class start/end?

Does the course schedule include a Lunchbreak?

How can someone reach me during class?

What languages are used to deliver training?

What does GTR stand for?

How do I find an ExitCertified training location?

Which delivery formats are available?

Does ExitCertified deliver group training?

What does vendor-authorized training mean?

Is the training too basic, or will you go deep into technology?

How up-to-date are your courses and support materials?

Are your instructors seasoned trainers who have deep knowledge of the training topic?

Do you provide hands-on training and exercises in an actual lab environment?

Will you customize the training for our company’s specific needs and goals?

How do I get started with certification?

Will I get access to content after I complete a course?

How to request a W9 for ExitCertified LLC?

Introduction to Python 3 Programming (Practical Programming for Beginners)

Alert!

Modal Title

Error!

Default Title

Prompt

Confirm

Login