Episode

Data Analysis and Preparation for Logistic Regression (Part 15 of 17) | Machine Learning for Beginners

with Bea Stollnitz

Join Bea Stollnitz, a Principal Cloud Advocate at Microsoft, as she demonstrates how to analyze and prepare data for building a logistic regression model. In this video, we'll be working with the pumpkin dataset used in previous videos, with the goal of predicting if a pumpkin is orange or white based on its features 🎃.

What you will learn:

How to explore and clean the dataset
How to visualize the data with Seaborn
How to transform categorical features using ordinal and one-hot encoding
How to use label encoders

In this video, you'll learn how to analyze the data, perform necessary cleanups, and transform categorical features into a suitable format for logistic regression. We'll be using Seaborn for visualization and demonstrate how to create bar plots and swarm plots to understand the relationship between pumpkin features.

Stay tuned for the next video in this series, where we'll use this prepared data to build a predictive model. See you there!

Chapters

00:00 - Introduction
00:28 - The notebook we are using
00:57 - Investigate the pumpkin data set
01:08 - Data cleanup on the pumpkin data set using pandas
01:20 - Visualize data using seaborn
02:23 - Data transformation for categorical features
03:05 - Transforming pumpkin size using an ordinal encoder
03:27 - Transforming categorical features using one hot encoding
04:03 - Transforming labels using a label encoder
04:25 - Using a seaborn cat plot and swarm plot

Recommended resources

This course is based on the free, open-source, 26-lesson ML For Beginners curriculum from Microsoft.
The Jupyter Notebook to follow along with this lesson is available!

Connect

Bea Stollnitz | Blog
Bea Stollnitz | Twitter: @beastollnitz
Bea Stollnitz | LinkedIn: in/beatrizstollnitz/

What you will learn:

How to explore and clean the dataset
How to visualize the data with Seaborn
How to transform categorical features using ordinal and one-hot encoding
How to use label encoders

Stay tuned for the next video in this series, where we'll use this prepared data to build a predictive model. See you there!

Chapters

00:00 - Introduction
00:28 - The notebook we are using
00:57 - Investigate the pumpkin data set
01:08 - Data cleanup on the pumpkin data set using pandas
01:20 - Visualize data using seaborn
02:23 - Data transformation for categorical features
03:05 - Transforming pumpkin size using an ordinal encoder
03:27 - Transforming categorical features using one hot encoding
04:03 - Transforming labels using a label encoder
04:25 - Using a seaborn cat plot and swarm plot

Recommended resources

This course is based on the free, open-source, 26-lesson ML For Beginners curriculum from Microsoft.
The Jupyter Notebook to follow along with this lesson is available!

Connect

Bea Stollnitz | Blog
Bea Stollnitz | Twitter: @beastollnitz
Bea Stollnitz | LinkedIn: in/beatrizstollnitz/

Azure Machine Learning

Python