Choose a stream processing technology in Azure
This article compares technology choices for real-time stream processing in Azure.
Real-time stream processing consumes messages from either queue or file-based storage, processes the messages, and forwards the result to another message queue, file store, or database. Processing may include querying, filtering, and aggregating messages. Stream processing engines must be able to consume endless streams of data and produce results with minimal latency. For more information, see Real time processing.
What are your options when choosing a technology for real-time processing?
In Azure, all of the following data stores will meet the core requirements supporting real-time processing:
- Azure Stream Analytics
- HDInsight with Spark Streaming
- Apache Spark in Azure Databricks
- Azure Functions
- Azure App Service WebJobs
- Apache Kafka streams API
Key Selection Criteria
For real-time processing scenarios, begin choosing the appropriate service for your needs by answering these questions:
Do you prefer a declarative or imperative approach to authoring stream processing logic?
Do you need built-in support for temporal processing or windowing?
Does your data arrive in formats besides Avro, JSON, or CSV? If yes, consider options that support any format using custom code.
Do you need to scale your processing beyond 1 GBps? If yes, consider the options that scale with the cluster size.
Capability matrix
The following tables summarize the key differences in capabilities.
General capabilities
Capability | Azure Stream Analytics | HDInsight with Spark Streaming | Apache Spark in Azure Databricks | Azure Functions | Azure App Service WebJobs |
---|---|---|---|---|---|
Programmability | SQL, JavaScript | C#/F#, Java, Python, Scala | C#/F#, Java, Python, R, Scala | C#, F#, Java, Node.js, Python | C#, Java, Node.js, PHP, Python |
Programming paradigm | Declarative | Mixture of declarative and imperative | Mixture of declarative and imperative | Imperative | Imperative |
Pricing model | Streaming units | Node cost per minute | Databricks units | Per function execution and resource consumption | Per App Service plan hour |
Integration capabilities
Capability | Azure Stream Analytics | HDInsight with Spark Streaming | Apache Spark in Azure Databricks | Azure Functions | Azure App Service WebJobs |
---|---|---|---|---|---|
Inputs | Azure Event Hubs, Azure IoT Hub, Azure Blob storage/Data Lake Storage Gen2 | Event Hubs, IoT Hub, Kafka, HDFS, Storage Blobs, Azure Data Lake Store | Event Hubs, IoT Hub, Kafka, HDFS, Storage Blobs, Azure Data Lake Store | Supported bindings | Service Bus, Storage Queues, Storage Blobs, Event Hubs, WebHooks, Azure Cosmos DB, Files |
Sinks | Azure Data Lake Storage Gen 1, Azure Data Explorer, Azure Database for PostgreSQL, Azure SQL Database, Azure Synapse Analytics, Blob storage and Azure Data Lake Gen 2, Azure Event Hubs, Power BI, Azure Table storage, Azure Service Bus queues, Azure Service Bus topics, Azure Cosmos DB, Azure Functions | HDFS, Kafka, Storage Blobs, Azure Data Lake Store, Azure Cosmos DB | HDFS, Kafka, Storage Blobs, Azure Data Lake Store, Azure Cosmos DB | Supported bindings | Service Bus, Storage Queues, Storage Blobs, Event Hubs, WebHooks, Azure Cosmos DB, Files |
Processing capabilities
Capability | Azure Stream Analytics | HDInsight with Spark Streaming | Apache Spark in Azure Databricks | Azure Functions | Azure App Service WebJobs |
---|---|---|---|---|---|
Built-in temporal/windowing support | Yes | Yes | Yes | No | No |
Input data formats | Avro, JSON or CSV, UTF-8 encoded | Any format using custom code | Any format using custom code Any format using custom code | Any format using custom code | |
Scalability | Query partitions | Bounded by cluster size | Bounded by Databricks cluster scale configuration | Up to 200 function app instances processing in parallel | Bounded by App Service plan capacity |
Late arrival and out of order event handling support | Yes | Yes | Yes | No | No |
Contributors
This article is maintained by Microsoft. It was originally written by the following contributors.
Principal author:
- Zoiner Tejada | CEO and Architect
Next steps
- App Service overview
- Explore Azure Functions
- Get started with Azure Stream Analytics
- Perform advanced streaming data transformations
- Set up clusters in HDInsight
- Use Apache Spark in Azure Databricks