This article is a solution idea. If you'd like us to expand the content with more information, such as potential use cases, alternative services, implementation considerations, or pricing guidance, let us know by providing GitHub feedback.
This article presents a solution for using Azure Kubernetes Service (AKS) to quickly process and analyze a large volume of streaming data from devices.
Apache®, Apache Kafka, and Apache Spark are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
Download a Visio file of this architecture.
- Sensors generate data and stream it to Azure API Management.
- An AKS cluster runs microservices that are deployed as containers behind a service mesh. The containers are built by using a DevOps process and are stored in Azure Container Registry.
- An ingest service stores data in Azure Cosmos DB.
- Asynchronously, an analysis service receives the data and streams it to Apache Kafka and Azure HDInsight.
- Data scientists use machine learning models and the Splunk platform to analyze the data.
- A processing service processes the data and stores the result in Azure Database for PostgreSQL. The service also caches the data in Azure Cache for Redis.
- A web app that runs in Azure App Service creates visualizations of the results.
The solution uses the following key technologies:
- API Management
- App Service
- Azure Cache for Redis
- Container Registry
- Azure Cosmos DB
- Azure Database for PostgreSQL
- Azure Pipelines
This solution is a good fit for a scenario that involves millions of data points, where data sources include Internet of Things (IoT) devices, sensors, and vehicles. In such a situation, processing the large volume of data is one challenge. Quickly analyzing the data is another demanding task, as organizations seek to gain insight into complex scenarios.
Containerized microservices in AKS form a key part of the solution. These self-contained services ingest and process the real-time data stream. They also scale as needed. The containers' portability makes it possible for the services to run in different environments and process data from multiple sources. To develop and deploy the microservices, DevOps and continuous integration/continuous delivery (CI/CD) are used. These approaches shorten the development cycle.
To store the ingested data, the solution uses Azure Cosmos DB. This database elastically scales throughput and storage, which makes it a good choice for large volumes of data.
The solution also uses Kafka. This low-latency streaming platform handles real-time data feeds at extremely high speeds.
Another key solution component is HDInsight, which is a managed, open-source cloud analytics service. HDInsight simplifies running big data frameworks in large volume and velocity while using Apache Spark in Azure. Splunk helps in the data analysis process. This platform creates visualizations from real-time data and provides business intelligence.
Potential use cases
This solution benefits the following areas:
- Vehicle safety, especially in the automotive industry
- Customer service in retail and other industries
- Healthcare cloud solutions
- Financial technology solutions in the finance industry
- About Azure Cache for Redis
- What is Azure API Management?
- App Service overview
- Azure Kubernetes Service
- Introduction to private Docker container registries in Azure
- Welcome to Azure Cosmos DB
- What is Azure Database for PostgreSQL?
- What is Azure HDInsight?
- What is Azure Pipelines?
Microsoft training modules:
- Build and store container images with Azure Container Registry
- Configure Azure App Service plans
- Work with Azure Cosmos DB
- Create and connect to an Azure Database for PostgreSQL
- Develop for Azure Cache for Redis
- Explore API Management
- Manage infrastructure as code using Azure and DSC
- Introduction to Azure HDInsight