Compare Fabric Data Engineering and Azure Synapse Spark
This comparison between Fabric Data Engineering and Azure Synapse Spark provides a summary of key features and an in-depth analysis across various categories, which include Spark pools, configuration, libraries, notebooks, and Spark job definitions.
The following table compares Azure Synapse Spark and Fabric Spark across different categories:
Category | Azure Synapse Spark | Fabric Spark |
---|---|---|
Spark pools | Spark pool - - |
Starter pool / Custom pool V-Order High concurrency |
Spark configurations | Pool level Notebook or Spark job definition level |
Environment level Notebook or Spark job definition level |
Spark libraries | Workspace level packages Pool level packages Inline packages |
- Environment libraries Inline libraries |
Resources | Notebook (Python, Scala, Spark SQL, R, .NET) Spark job definition (Python, Scala, .NET) Synapse data pipelines Pipeline activities (notebook, SJD) |
Notebook (Python, Scala, Spark SQL, R) Spark job definition (Python, Scala, R) Data Factory data pipelines Pipeline activities (notebook, SJD) |
Data | Primary storage (ADLS Gen2) Data residency (cluster/region based) |
Primary storage (OneLake) Data residency (capacity/region based) |
Metadata | Internal Hive Metastore (HMS) External HMS (using Azure SQL DB) |
Internal HMS (lakehouse) - |
Connections | Connector type (linked services) Data sources Data source conn. with workspace identity |
Connector type (DMTS) Data sources - |
Security | RBAC and access control Storage ACLs (ADLS Gen2) Private Links Managed VNet (network isolation) Synapse workspace identity Data Exfiltration Protection (DEP) Service tags Key Vault (via mssparkutils/ linked service) |
RBAC and access control OneLake RBAC Private Links Managed VNet Workspace identity - Service tags Key Vault (via mssparkutils) |
DevOps | Azure DevOps integration CI/CD (no built-in support) |
Azure DevOps integration Deployment pipelines |
Developer experience | IDE integration (IntelliJ) Synapse Studio UI Collaboration (workspaces) Livy API API/SDK mssparkutils |
IDE integration (VS Code) Fabric UI Collaboration (workspaces and sharing) - API/SDK mssparkutils |
Logging and monitoring | Spark Advisor Built-in monitoring pools and jobs (through Synapse Studio) Spark history server Prometheus/Grafana Log Analytics Storage Account Event Hubs |
Spark Advisor Built-in monitoring pools and jobs (through Monitoring hub) Spark history server - - - - |
Business continuity and disaster recovery (BCDR) | BCDR (data) ADLS Gen2 | BCDR (data) OneLake |
Considerations and limitations:
DMTS integration: You can't use the DMTS via notebooks and Spark job definitions.
Workload level RBAC: Fabric supports four different workspace roles. Fore more information, see Roles in workspaces in Microsoft Fabric.
Managed identity: Currently, Fabric doesn't support running notebooks and Spark job definitions using the workspace identity or managed identity for Azure KeyVault in notebooks.
CI/CD: You can use the Fabric API/SDK and deployment pipelines.
Livy API and how to submit and manage Spark jobs: Livy API is in the roadmap but not exposed yet in Fabric. You must create notebooks and Spark job definitions with the Fabric UI.
Spark logs and metrics: In Azure Synapse you can emit Spark logs and metrics to your own storage, such as Log Analytics, blob, and Event Hubs. You can also get a list of Spark applications for the workspace from the API. Currently, both of these capabilities aren't available in Fabric.
Other considerations:
- JDBC: JDBC connection support isn't currently available in Fabric.
Spark pool comparison
The following table compares Azure Synapse Spark and Fabric Spark pools.
Spark setting | Azure Synapse Spark | Fabric Spark |
---|---|---|
Live pool (pre-warm instances) | - | Yes, Starter pools |
Custom pool | Yes | Yes |
Spark versions (runtime) | 2.4, 3.1, 3.2, 3.3, 3.4 | 3.3, 3.4, 3.5 |
Autoscale | Yes | Yes |
Dynamic allocation of executors | Yes, up to 200 | Yes, based on capacity |
Adjustable node sizes | Yes, 3-200 | Yes, 1-based on capacity |
Minimum node configuration | 3 nodes | 1 node |
Node size family | Memory Optimized, GPU accelerated | Memory Optimized |
Node size | Small-XXXLarge | Small-XXLarge |
Autopause | Yes, customizable minimum 5 minutes | Yes, noncustomizable 2 minutes |
High concurrency | No | Yes |
V-Order | No | Yes |
Spark autotune | No | Yes |
Native Execution Engine | No | Yes |
Concurrency limits | Fixed | Variable based on capacity |
Multiple Spark pools | Yes | Yes (environments) |
Intelligent cache | Yes | Yes |
API/SDK support | Yes | Yes |
Runtime: Fabric doesn't support Spark 2.4, 3.1, and 3.2 versions. Fabric Spark supports Spark 3.3 with Delta 2.2 within Runtime 1.1, Spark 3.4 with Delta 2.4 within Runtime 1.2 and Spark 3.5 with Delta 3.1 within Runtime 1.3.
Autoscale: In Azure Synapse Spark, the pool can scale up to 200 nodes regardless of the node size. In Fabric, the maximum number of nodes is subjected to node size and provisioned capacity. See the following example for the F64 SKU.
Spark pool size Azure Synapse Spark Fabric Spark (Custom Pool, SKU F64) Small Min: 3, Max: 200 Min: 1, Max: 32 Medium Min: 3, Max: 200 Min: 1, Max: 16 Large Min: 3, Max: 200 Min: 1, Max: 8 X-Large Min: 3, Max: 200 Min: 1, Max: 4 XX-Large Min: 3, Max: 200 Min: 1, Max: 2 Adjustable node sizes: In Azure Synapse Spark, you can go up to 200 nodes. In Fabric, the number of nodes you can have in your custom Spark pool depends on your node size and Fabric capacity. Capacity is a measure of how much computing power you can use in Azure. One way to think of it is that two Spark vCores (a unit of computing power for Spark) equals one capacity unit. For example, a Fabric Capacity SKU F64 has 64 capacity units, which is equivalent to 128 Spark VCores. So, if you choose a small node size, you can have up to 32 nodes in your pool (128/4 = 32). Then, the total of vCores in the capacity/vCores per node size = total number of nodes available. For more information, see Spark compute.
Node size family: Fabric Spark pools only support Memory Optimized node size family for now. If you're using a GPU-accelerated SKU Spark pool in Azure Synapse, they aren't available in Fabric.
Node size: The xx-large node size comes with 432 GB of memory in Azure Synapse, while the same node size has 512 GB in Fabric including 64 vCores. The rest of the node sizes (small through x-large) have the same vCores and memory in both Azure Synapse and Fabric.
Automatic pausing: If you enable it in Azure Synapse Spark, the Apache Spark pool will automatically pause after a specified amount of idle time. This setting is configurable in Azure Synapse (minimum 5 minutes), but custom pools have a noncustomizable default autopause duration of 2 minutes in Fabric after the session expires. The default session expiration is set to 20 minutes in Fabric.
High concurrency: Fabric supports high concurrency in notebooks. For more information, see High concurrency mode in Fabric Spark.
Concurrency limits: In terms of concurrency, Azure Synapse Spark has a limit of 50 simultaneous running jobs per Spark pool and 200 queued jobs per Spark pool. The maximum active jobs are 250 per Spark pool and 1000 per workspace. In Microsoft Fabric Spark, capacity SKUs define the concurrency limits. SKUs have varying limits on max concurrent jobs that range from 1 to 512. Also, Fabric Spark has a dynamic reserve-based throttling system to manage concurrency and ensure smooth operation even during peak usage times. For more information, see Concurrency limits and queueing in Microsoft Fabric Spark and Fabric capacities.
Multiple Spark pools: If you want to have multiple Spark pools, use Fabric environments to select a pool by notebook or Spark job definition. For more information, see Create, configure, and use an environment in Microsoft Fabric.
Note
Learn how to migrate Azure Synapse Spark pools to Fabric.
Spark configurations comparison
Spark configurations can be applied at different levels:
- Environment level: These configurations are used as the default configuration for all Spark jobs in the environment.
- Inline level: Set Spark configurations inline using notebooks and Spark job definitions.
While both options are supported in Azure Synapse Spark and Fabric, there are some considerations:
Spark configuration | Azure Synapse Spark | Fabric Spark |
---|---|---|
Environment level | Yes, pools | Yes, environments |
Inline | Yes | Yes |
Import/export | Yes | Yes (.yml from environments) |
API/SDK support | Yes | Yes |
Environment level: In Azure Synapse, you can define multiple Spark configurations and assign them to different Spark pools. You can do this in Fabric by using environments.
Inline: In Azure Synapse, both notebooks and Spark jobs support attaching different Spark configurations. In Fabric, session level configurations are customized with the
spark.conf.set(<conf_name>, <conf_value>)
setting. For batch jobs, you can also apply configurations via SparkConf.Import/export: This option for Spark configurations is available in Fabric environments.
Other considerations:
- Immutable Spark configurations: Some Spark configurations are immutable. If you get the message
AnalysisException: Can't modify the value of a Spark config: <config_name>
, the property in question is immutable. - FAIR scheduler: FAIR scheduler is used in high concurrency mode.
- V-Order: V-Order is write-time optimization applied to the parquet files enabled by default in Fabric Spark pools.
- Optimized Write: Optimized Write is disabled by default in Azure Synapse but enabled by default for Fabric Spark.
- Immutable Spark configurations: Some Spark configurations are immutable. If you get the message
Note
Learn how to Migrate Spark configurations from Azure Synapse to Fabric.
Spark libraries comparison
You can apply Spark libraries at different levels:
- Workspace level: You can't upload/install these libraries to your workspace and later assign them to a specific Spark pool in Azure Synapse.
- Environment level: You can upload/install libraries to an environment. Environment-level libraries are available to all notebooks and Spark job definitions running in the environment.
- Inline: In addition to environment-level libraries, you can also specify inline libraries. For example, at the beginning of a notebook session.
Considerations:
Spark library | Azure Synapse Spark | Fabric Spark |
---|---|---|
Workspace level | Yes | No |
Environment level | Yes, Pools | Yes, environments |
Inline | Yes | Yes |
Import/export | Yes | Yes |
API/SDK support | Yes | Yes |
- Other considerations:
- Built-in libraries: Fabric and Azure Synapse share a common core of Spark, but they can slightly differ in different support of their runtime libraries. Typically, using code is compatible with some exceptions. In that case, users might need compilation, the addition of custom libraries, and adjusting syntax. See built-in Fabric Spark runtime libraries here.
Note
Learn how to migrate Azure Synapse Spark libraries to Fabric.
Notebook comparison
Notebooks and Spark job definitions are primary code items for developing Apache Spark jobs in Fabric. There are some differences between Azure Synapse Spark notebooks and Fabric Spark notebooks:
Notebook capability | Azure Synapse Spark | Fabric Spark |
---|---|---|
Import/export | Yes | Yes |
Session configuration | Yes, UI and inline | Yes, UI (environment) and inline |
IntelliSense | Yes | Yes |
mssparkutils | Yes | Yes |
Notebook resources | No | Yes |
Collaborate | No | Yes |
High concurrency | No | Yes |
.NET for Spark C# | Yes | No |
Pipeline activity support | Yes | Yes |
Built-in scheduled run support | No | Yes |
API/SDK support | Yes | Yes |
mssparkutils: Because DMTS connections aren't supported in Fabric yet, only
getToken
andgetSecret
are supported for now in Fabric formssparkutils.credentials
.Notebooks resources: Fabric notebooks provide a Unix-like file system to help you manage your folders and files. For more information, see How to use Microsoft Fabric notebooks.
Collaborate: The Fabric notebook is a collaborative item that supports multiple users editing the same notebook. For more information, see How to use Microsoft Fabric notebooks.
High concurrency: In Fabric, you can attach notebooks to a high concurrency session. This option is an alternative for users using ThreadPoolExecutor in Azure Synapse. For more information, see Configure high concurrency mode for Fabric notebooks.
.NET for Spark C#: Fabric doesn't support .NET Spark (C#). However, we recommendation that users with existing workloads written in C# or F# migrate to Python or Scala.
Built-in scheduled run support: Fabric supports scheduled runs for notebooks.
Other considerations:
- You can use features inside a notebook that are only supported in a specific version of Spark. Remember that Spark 2.4 and 3.1 aren't supported in Fabric.
- If your notebook or Spark job is using a linked service with different data source connections or mount points, you should modify your Spark jobs to use alternative methods for handling connections to external data sources and sinks. Use Spark code to connect to data sources using available Spark libraries.
Note
Learn how to Migrate notebooks from Azure Synapse to Fabric.
Spark job definition comparison
Important Spark job definition considerations:
Spark job capability | Azure Synapse Spark | Fabric Spark |
---|---|---|
PySpark | Yes | Yes |
Scala | Yes | Yes |
.NET for Spark C# | Yes | No |
SparkR | No | Yes |
Import/export | Yes (UI) | No |
Pipeline activity support | Yes | Yes |
Built-in scheduled run support | No | Yes |
Retry policies | No | Yes |
API/SDK support | Yes | Yes |
Spark jobs: You can bring your .py/.R/jar files. Fabric supports SparkR. A Spark job definition supports reference files, command line arguments, Spark configurations, and lakehouse references.
Import/export: In Azure Synapse, you can import/export json-based Spark job definitions from the UI. This feature isn't available yet in Fabric.
.NET for Spark C#: Fabric doesn't support .NET Spark (C#). However, the recommendation is that users with existing workloads written in C# or F# migrate to Python or Scala.
Built-in scheduled run support: Fabric supports scheduled runs for a Spark job definition.
Retry policies: This option enables users to run Spark-structured streaming jobs indefinitely.
Note
Learn how to Migrate Spark job definitions from Azure Synapse to Fabric.
Hive Metastore (HMS) comparison
Hive MetaStore (HMS) differences and considerations:
HMS type | Azure Synapse Spark | Fabric Spark |
---|---|---|
Internal HMS | Yes | Yes (lakehouse) |
External HMS | Yes | No |
- External HMS: Fabric currently doesn't support a Catalog API and access to an external Hive Metastore (HMS).
Note
Learn how to migrate Azure Synapse Spark catalog HMS metadata to Fabric.
Related content
- Learn more about migration options for Spark pools, configurations, libraries, notebooks, and Spark job definitions
- Migrate data and pipelines
- Migrate Hive Metastore metadata