Introduction

Completed

Clustering is the process of grouping objects with similar objects. For example, the following image shows a collection of 2D coordinates that have been clustered into three categories - top left (yellow), bottom (red), and top right (blue).

Illustration showing coordinates that have been clustered into three categories.

Clustering, unlike the technique called classification, is considered an unsupervised method of grouping, in which training is done without labels. Clustering models identify examples that have a similar collection of features. In the preceding image, examples that are in a similar location are grouped together.

Clustering is common and useful for exploring new data, where patterns between data points, such as high-level categories, are not yet known. It's used in many fields that need to automatically label complex data, including analysis of social networks, brain connectivity, and spam filtering.

Produced in partnership with Eric Wanjau - Microsoft Learn Student Ambassador and Researcher/Data Scientist: Leeds Institute for Data Analytics, University of Leeds