Clustering modules

Important

Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.

Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.

See information on moving machine learning projects from ML Studio (classic) to Azure Machine Learning.
Learn more about Azure Machine Learning.

ML Studio (classic) documentation is being retired and may not be updated in the future.

This article describes the modules in Machine Learning Studio (classic) that support creation of clustering models.

Note

Applies to: Machine Learning Studio (classic) only

Similar drag-and-drop modules are available in Azure Machine Learning designer.

What is clustering?

Clustering, in machine learning, is a method of grouping data points into similar clusters. It is also called segmentation.

Over the years, many clustering algorithms have been developed. Almost all clustering algorithms use the features of individual items to find similar items. For example, you might apply clustering to find similar people by demographics. You might use clustering with text analysis to group sentences with similar topics or sentiment.

Clustering is called a non-supervised learning technique because it can be used in unlabeled data. Indeed, clustering is a useful first step for discovering new patterns, and requires little prior knowledge about how the data might be structured or how items are related. Clustering is often used for exploration of data prior to analysis with other more predictive algorithms.

How to create a clustering model

In Machine Learning Studio (classic), you can use clustering with either labeled or unlabeled data.

In unlabeled data, the clustering algorithm determines which data points are closest together, and creates clusters around a central point, or centroid. You can then use the cluster ID as a temporary label for the group of data.
If the data has labels, you can use the label to drive the number of clusters, or use the label as just another feature.

After you have configured the clustering algorithm, you train it on data by using either the Train Clustering Model or Sweep Clustering modules.

When the model is trained, use it to predict cluster membership for new data points. For example, if you have used clustering to group customers by purchasing behavior, you can use the model to predict the purchasing behavior of new customers.

List of modules

The clustering category includes this module:

K-Means Clustering: Configures and initializes a K-means clustering model.

To use a different clustering algorithm, or create a custom clustering model by using R, see these topics:

Examples

For examples of clustering in action, see the Azure AI Gallery.

See these articles for help choosing an algorithm:

Machine learning algorithm cheat sheet for Machine Learning Studio (classic)

Provides a graphical decision chart to guide you through the selection process.
How to choose Machine Learning algorithms for clustering, classification, or regression

Explains in greater detail the different types of machine learning algorithms, and how they're used.

Clustering modules

What is clustering?

How to create a clustering model

List of modules

Related tasks

Examples

See also

Additional resources