Classification is a form of machine learning where you train a model to predict which category an item belongs to. For example, a health clinic might use patient diagnostic data such as height, weight, blood pressure, and blood glucose level to predict whether the patient is diabetic.

Illustration of medical diagnostic features predicting diabetes.

Categorical data has classes rather than numeric values. Some kinds of data can be either numeric or categorical. For example, the time to run a race could be a numeric time in seconds, or a categorical class of fast, medium, or slow. Other kinds of data can only be categorical, such as a type of shape - circle, triangle, or square.


  • Knowledge of basic mathematics
  • Some experience programming in Python

Learning objectives

In this module, you'll learn:

  • When to use classification
  • How to train and evaluate a classification model using the Scikit-Learn framework