Product recommendations for retail using Azure

Blob Storage
Event Hubs
HDInsight
Stream Analytics
Power BI

Solution ideas

This article is a solution idea. If you'd like us to expand the content with more information, such as potential use cases, alternative services, implementation considerations, or pricing guidance, let us know by providing GitHub feedback.

A deep understanding between customer interests and purchasing patterns is a critical component of any retail business intelligence operation. This article presents a solution for implementing a process of aggregating customer data into a complete profile. Advanced machine learning models are backed by the reliability and processing power of Azure to provide predictive insights on simulated customers.

Potential use cases

This solution is typically employed by retailers.

Architecture

Architecture diagram that shows the flow of data between an event generator and a dashboard. Other stages include analytics and machine learning. Download an SVG of this architecture.

Dataflow

  1. A data generator pipes simulated customer events to Azure Event Hubs.
  2. An Azure Stream Analytics job reads from Event Hubs and performs aggregations.
  3. Stream Analytics persists time-grouped data to Azure Blob Storage.
  4. A Spark job that runs in Azure HDInsight merges the latest customer browsing data with historical purchase and demographic data, to build a combined user profile.
  5. A second Spark job scores each customer profile against a machine learning model. This process predicts future purchasing patterns. These predictions suggest whether a given customer is likely to make a purchase in the next 30 days. If so, the system determines the likely product category of the purchase.
  6. Predictions and other profile data are visualized and shared as charts and tables in the Power BI service.

Components

  • Blob Storage is a service that's part of Azure Storage. Blob Storage offers optimized cloud object storage for large amounts of unstructured data.
  • Event Hubs is a fully managed streaming platform.
  • Azure Machine Learning is a cloud-based environment that you can use to train, deploy, automate, manage, and track machine learning models.
  • Azure SQL Database is a fully managed platform as a service (PaaS) database engine. SQL Database runs on the latest stable version of SQL Server and a patched operating system.
  • Stream Analytics offers real-time serverless stream processing. This service provides a way to run queries in the cloud and on edge devices.
  • Power BI is a business analytics service that provides interactive visualizations and business intelligence capabilities. Its easy-to-use interface makes it possible for you to create your own reports and dashboards.
  • HDInsight is a managed, full-spectrum, open-source, cloud-based analytics service for enterprises.

Deploy this scenario

For more details on how this solution is built, see the solution guide in GitHub.

A typical retail business collects customer data through various channels. These channels include web-browsing patterns, purchase behaviors, demographics, and other session-based web data. Some of the data originates from core business operations. However, other data must be pulled and joined from external sources, such as partners, manufacturers, the public domain, and so on.

Many businesses apply only a small portion of the available data, but to maximize ROI, a business must integrate relevant data from all sources. Traditionally, the integration of external, heterogeneous data sources into a shared data processing engine requires significant effort and resources to set up. This solution describes a simple, scalable approach to integrating analytics and machine learning to predict customer purchasing activity.

Solution features

This solution addresses the problems that the previous section pointed out:

  • By uniformly accessing data from multiple data sources, while minimizing data movement and system complexity, which boosts performance.
  • By performing extract-transfer-load (ETL) operations and the feature engineering that's needed to use a predictive machine learning model.
  • By creating a comprehensive customer 360 profile, which is enriched by predictive analytics that run across a distributed system. This analysis is backed by Microsoft R Server and HDInsight.

Next steps