Streaming using HDInsight

HDInsight

Solution ideas

This article is a solution idea. If you'd like us to expand the content with more information, such as potential use cases, alternative services, implementation considerations, or pricing guidance, let us know by providing GitHub feedback.

Ingest and process millions of streaming events per second with Apache Kafka, Apache Storm, and Apache Spark Streaming.

Potential use cases

Companies can use this solution to retrieve (or ingest) data from multiple sources and make real-time business decisions. Scenarios include:

  • Analyzing data from IoT sensors for quality detection, fault analysis, maintenance event prediction
  • Business integration of weather feed or sensor data (agriculture, retail)
  • Analysis of real-time stock market data (financial)
  • Analysis of current market conditions (insurance and finance)
  • Trend analysis over real-time sales (retail)

Architecture

Architecture Diagram shows the flow of data through the different processes.

Download a Visio file of this architecture.

Dataflow

  • Streaming data is ingested, processed, and the results are stored by the following:
    • Apache Kafka for data ingestion
    • Apache Spark Streaming or Apache Storm for processing
    • Apache HBase, which is a NoSQL database, for the storage of analyzed results
  • The data is consumed by the user in the related apps.
  • The data is visualized in Power BI.
  • The data used by Azure HDInsight is stored in Azure Data Lake Storage for secure and scalable processing in the cloud.

Components

Key technologies used to implement this architecture:

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

Principal authors:

Next steps

To learn more about these services, see the following articles: