Episode

Data Analysis and Preparation for Logistic Regression (Part 15 of 17) | Machine Learning for Beginners

with Bea Stollnitz

Join Bea Stollnitz, a Principal Cloud Advocate at Microsoft, as she demonstrates how to analyze and prepare data for building a logistic regression model. In this video, we'll be working with the pumpkin dataset used in previous videos, with the goal of predicting if a pumpkin is orange or white based on its features 🎃.

What you will learn:

  • How to explore and clean the dataset
  • How to visualize the data with Seaborn
  • How to transform categorical features using ordinal and one-hot encoding
  • How to use label encoders

In this video, you'll learn how to analyze the data, perform necessary cleanups, and transform categorical features into a suitable format for logistic regression. We'll be using Seaborn for visualization and demonstrate how to create bar plots and swarm plots to understand the relationship between pumpkin features.

Stay tuned for the next video in this series, where we'll use this prepared data to build a predictive model. See you there!

Chapters

  • 00:00 - Introduction
  • 00:28 - The notebook we are using
  • 00:57 - Investigate the pumpkin data set
  • 01:08 - Data cleanup on the pumpkin data set using pandas
  • 01:20 - Visualize data using seaborn
  • 02:23 - Data transformation for categorical features
  • 03:05 - Transforming pumpkin size using an ordinal encoder
  • 03:27 - Transforming categorical features using one hot encoding
  • 04:03 - Transforming labels using a label encoder
  • 04:25 - Using a seaborn cat plot and swarm plot
  • This course is based on the free, open-source, 26-lesson ML For Beginners curriculum from Microsoft.
  • The Jupyter Notebook to follow along with this lesson is available!

Connect

Azure Machine Learning
Python