Choose a batch processing technology in Azure

Big data solutions often consist of discrete batch processing tasks that contribute to the overall data processing solution. You can use batch processing for workloads that don't require immediate access to insights. Batch processing can complement real-time processing requirements. You can also use batch processing to balance complexity and reduce cost for your overall implementation.

The fundamental requirement of batch processing engines is to scale out computations to handle a large volume of data. Unlike real-time processing, batch processing has latencies, or the time between data ingestion and computing a result, of minutes or hours.

Choose a technology for batch processing

Microsoft offers several services that you can use to do batch processing.

Microsoft Fabric

Microsoft Fabric is an all-in-one analytics and data platform for organizations. It's a software as a service offering that simplifies how you provision, manage, and govern an end-to-end analytics solution. Fabric handles data movement, processing, ingestion, transformation, and reporting. Fabric features that you use for batch processing include data engineering, data warehouses, lakehouses, and Apache Spark processing. Azure Data Factory in Fabric also supports lakehouses. To simplify and accelerate development, you can enable AI-driven Copilot.

Languages: R, Python, Java, Scala, and SQL
Security: Managed virtual network and OneLake role-based access control (RBAC)
Primary storage: OneLake, which has shortcuts and mirroring options
Spark: A prehydrated starter pool and a custom Spark pool with predefined node sizes

Azure Databricks

Azure Databricks is a Spark-based analytics platform. It features rich and premium Spark features that are built on top of open-source Spark. Azure Databricks is a Microsoft service that integrates with the rest of the Azure services. It features extra configurations for Spark cluster deployments. And Unity Catalog helps simplify the governance of Azure Databricks Spark objects.

Languages: R, Python, Java, Scala, and Spark SQL.
Security: User authentication with Microsoft Entra ID.
Primary storage: Built-in integration with Azure Blob Storage, Data Lake Storage, Fabric OneLake, and other services. For more information, see Data sources.

Other benefits include:

Web-based notebooks for collaboration and data exploration.
Fast cluster start times, automatic termination, and autoscaling.
Support for GPU-enabled clusters.

Key selection criteria

To choose your technology for batch processing, consider the following questions:

Do you want a managed service, or do you want to manage your own servers?
Do you want to author batch processing logic declaratively or imperatively?
Do you perform batch processing in bursts? If yes, consider options that provide the ability to automatically terminate a cluster or that have pricing models for each batch job.
Do you need to query relational data stores along with your batch processing, for example to look up reference data? If yes, consider options that provide the ability to query external relational stores.

Capability matrix

The following tables summarize key differences in capabilities between services.

General capabilities

Capability	Fabric	Azure Databricks
Software as a service	Yes¹	No
Managed service	No	Yes
Relational data store	Yes	Yes
Pricing model	Capacity units	Azure Databricks unit ² and cluster hour

[1] Assigned Fabric capacity.

[2] An Azure Databricks unit is the processing capability per hour.

Other capabilities

Capability	Fabric	Azure Databricks
Autoscaling	No	Yes
Scale-out granularity	Per Fabric SKU	Per cluster
In-memory caching of data	No	Yes
Query from external relational stores	Yes	Yes
Authentication	Microsoft Entra ID	Microsoft Entra ID
Auditing	Yes	Yes
Row-level security	Yes	Yes
Supports firewalls	Yes	Yes
Dynamic data masking	Yes	Yes

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

Principal authors:

Zoiner Tejada | CEO and Architect
Pratima Valavala | Principal Solutions Architect

To see non-public LinkedIn profiles, sign in to LinkedIn.

Next steps

Feedback

Was this page helpful?

Last updated on 2025-12-11