Compare Fabric Data Engineering and Azure Synapse Spark

Grein
11/15/2023

This comparison between Fabric Data Engineering and Azure Synapse Spark provides a summary of key features and an in-depth analysis across various categories, which include Spark pools, configuration, libraries, notebooks, and Spark job definitions.

The following table compares Azure Synapse Spark and Fabric Spark across different categories:

Category	Azure Synapse Spark	Fabric Spark
Spark pools	Spark pool - -	Starter pool / Custom pool V-Order High concurrency
Spark configurations	Pool level Notebook or Spark job definition level	Environment level Notebook or Spark job definition level
Spark libraries	Workspace level packages Pool level packages Inline packages	- Environment libraries Inline libraries
Resources	Notebook (Python, Scala, Spark SQL, R, .NET) Spark job definition (Python, Scala, .NET) Synapse data pipelines Pipeline activities (notebook, SJD)	Notebook (Python, Scala, Spark SQL, R) Spark job definition (Python, Scala, R) Data Factory data pipelines Pipeline activities (notebook, SJD)
Data	Primary storage (ADLS Gen2) Data residency (cluster/region based)	Primary storage (OneLake) Data residency (capacity/region based)
Metadata	Internal Hive Metastore (HMS) External HMS (using Azure SQL DB)	Internal HMS (lakehouse) -
Connections	Connector type (linked services) Data sources Data source conn. with workspace identity	Connector type (DMTS) Data sources -
Security	RBAC and access control Storage ACLs (ADLS Gen2) Private Links Managed VNet (network isolation) Synapse workspace identity Data Exfiltration Protection (DEP) Service tags Key Vault (via mssparkutils/ linked service)	RBAC and access control OneLake RBAC Private Links Managed VNet Workspace identity - Service tags Key Vault (via mssparkutils)
DevOps	Azure DevOps integration CI/CD (no built-in support)	Azure DevOps integration Deployment pipelines
Developer experience	IDE integration (IntelliJ) Synapse Studio UI Collaboration (workspaces) Livy API API/SDK mssparkutils	IDE integration (VS Code) Fabric UI Collaboration (workspaces and sharing) - API/SDK mssparkutils
Logging and monitoring	Spark Advisor Built-in monitoring pools and jobs (through Synapse Studio) Spark history server Prometheus/Grafana Log Analytics Storage Account Event Hubs	Spark Advisor Built-in monitoring pools and jobs (through Monitoring hub) Spark history server - - - -
Business continuity and disaster recovery (BCDR)	BCDR (data) ADLS Gen2	BCDR (data) OneLake

Considerations and limitations:

DMTS integration: You can't use the DMTS via notebooks and Spark job definitions.
Workload level RBAC: Fabric supports four different workspace roles. Fore more information, see Roles in workspaces in Microsoft Fabric.
Managed identity: Currently, Fabric doesn't support running notebooks and Spark job definitions using the workspace identity or managed identity for Azure KeyVault in notebooks.
CI/CD: You can use the Fabric API/SDK and deployment pipelines.
Livy API and how to submit and manage Spark jobs: Livy API is in the roadmap but not exposed yet in Fabric. You must create notebooks and Spark job definitions with the Fabric UI.
Spark logs and metrics: In Azure Synapse you can emit Spark logs and metrics to your own storage, such as Log Analytics, blob, and Event Hubs. You can also get a list of Spark applications for the workspace from the API. Currently, both of these capabilities aren't available in Fabric.
Other considerations:
- JDBC: JDBC connection support isn't currently available in Fabric.

Spark pool comparison

The following table compares Azure Synapse Spark and Fabric Spark pools.

Spark setting	Azure Synapse Spark	Fabric Spark
Live pool (pre-warm instances)	-	Yes, Starter pools
Custom pool	Yes	Yes
Spark versions (runtime)	2.4, 3.1, 3.2, 3.3, 3.4	3.3, 3.4, 3.5
Autoscale	Yes	Yes
Dynamic allocation of executors	Yes, up to 200	Yes, based on capacity
Adjustable node sizes	Yes, 3-200	Yes, 1-based on capacity
Minimum node configuration	3 nodes	1 node
Node size family	Memory Optimized, GPU accelerated	Memory Optimized
Node size	Small-XXXLarge	Small-XXLarge
Autopause	Yes, customizable minimum 5 minutes	Yes, noncustomizable 2 minutes
High concurrency	No	Yes
V-Order	No	Yes
Spark autotune	No	Yes
Native Execution Engine	No	Yes
Concurrency limits	Fixed	Variable based on capacity
Multiple Spark pools	Yes	Yes (environments)
Intelligent cache	Yes	Yes
API/SDK support	Yes	Yes

Runtime: Fabric doesn't support Spark 2.4, 3.1, and 3.2 versions. Fabric Spark supports Spark 3.3 with Delta 2.2 within Runtime 1.1, Spark 3.4 with Delta 2.4 within Runtime 1.2 and Spark 3.5 with Delta 3.1 within Runtime 1.3.

Autoscale: In Azure Synapse Spark, the pool can scale up to 200 nodes regardless of the node size. In Fabric, the maximum number of nodes is subjected to node size and provisioned capacity. See the following example for the F64 SKU.

Spark pool size	Azure Synapse Spark	Fabric Spark (Custom Pool, SKU F64)
Small	Min: 3, Max: 200	Min: 1, Max: 32
Medium	Min: 3, Max: 200	Min: 1, Max: 16
Large	Min: 3, Max: 200	Min: 1, Max: 8
X-Large	Min: 3, Max: 200	Min: 1, Max: 4
XX-Large	Min: 3, Max: 200	Min: 1, Max: 2

Adjustable node sizes: In Azure Synapse Spark, you can go up to 200 nodes. In Fabric, the number of nodes you can have in your custom Spark pool depends on your node size and Fabric capacity. Capacity is a measure of how much computing power you can use in Azure. One way to think of it is that two Spark vCores (a unit of computing power for Spark) equals one capacity unit. For example, a Fabric Capacity SKU F64 has 64 capacity units, which is equivalent to 128 Spark VCores. So, if you choose a small node size, you can have up to 32 nodes in your pool (128/4 = 32). Then, the total of vCores in the capacity/vCores per node size = total number of nodes available. For more information, see Spark compute.
Node size family: Fabric Spark pools only support Memory Optimized node size family for now. If you're using a GPU-accelerated SKU Spark pool in Azure Synapse, they aren't available in Fabric.
Node size: The xx-large node size comes with 432 GB of memory in Azure Synapse, while the same node size has 512 GB in Fabric including 64 vCores. The rest of the node sizes (small through x-large) have the same vCores and memory in both Azure Synapse and Fabric.
Automatic pausing: If you enable it in Azure Synapse Spark, the Apache Spark pool will automatically pause after a specified amount of idle time. This setting is configurable in Azure Synapse (minimum 5 minutes), but custom pools have a noncustomizable default autopause duration of 2 minutes in Fabric after the session expires. The default session expiration is set to 20 minutes in Fabric.
High concurrency: Fabric supports high concurrency in notebooks. For more information, see High concurrency mode in Fabric Spark.
Concurrency limits: In terms of concurrency, Azure Synapse Spark has a limit of 50 simultaneous running jobs per Spark pool and 200 queued jobs per Spark pool. The maximum active jobs are 250 per Spark pool and 1000 per workspace. In Microsoft Fabric Spark, capacity SKUs define the concurrency limits. SKUs have varying limits on max concurrent jobs that range from 1 to 512. Also, Fabric Spark has a dynamic reserve-based throttling system to manage concurrency and ensure smooth operation even during peak usage times. For more information, see Concurrency limits and queueing in Microsoft Fabric Spark and Fabric capacities.
Multiple Spark pools: If you want to have multiple Spark pools, use Fabric environments to select a pool by notebook or Spark job definition. For more information, see Create, configure, and use an environment in Microsoft Fabric.

Note

Learn how to migrate Azure Synapse Spark pools to Fabric.

Spark configurations comparison

Spark configurations can be applied at different levels:

Environment level: These configurations are used as the default configuration for all Spark jobs in the environment.
Inline level: Set Spark configurations inline using notebooks and Spark job definitions.

While both options are supported in Azure Synapse Spark and Fabric, there are some considerations:

Spark configuration	Azure Synapse Spark	Fabric Spark
Environment level	Yes, pools	Yes, environments
Inline	Yes	Yes
Import/export	Yes	Yes (.yml from environments)
API/SDK support	Yes	Yes

Environment level: In Azure Synapse, you can define multiple Spark configurations and assign them to different Spark pools. You can do this in Fabric by using environments.
Inline: In Azure Synapse, both notebooks and Spark jobs support attaching different Spark configurations. In Fabric, session level configurations are customized with the spark.conf.set(<conf_name>, <conf_value>) setting. For batch jobs, you can also apply configurations via SparkConf.
Import/export: This option for Spark configurations is available in Fabric environments.
Other considerations:
- Immutable Spark configurations: Some Spark configurations are immutable. If you get the message AnalysisException: Can't modify the value of a Spark config: <config_name>, the property in question is immutable.
- FAIR scheduler: FAIR scheduler is used in high concurrency mode.
- V-Order: V-Order is write-time optimization applied to the parquet files enabled by default in Fabric Spark pools.
- Optimized Write: Optimized Write is disabled by default in Azure Synapse but enabled by default for Fabric Spark.

Note

Learn how to Migrate Spark configurations from Azure Synapse to Fabric.

Spark libraries comparison

You can apply Spark libraries at different levels:

Workspace level: You can't upload/install these libraries to your workspace and later assign them to a specific Spark pool in Azure Synapse.
Environment level: You can upload/install libraries to an environment. Environment-level libraries are available to all notebooks and Spark job definitions running in the environment.
Inline: In addition to environment-level libraries, you can also specify inline libraries. For example, at the beginning of a notebook session.

Considerations:

Spark library	Azure Synapse Spark	Fabric Spark
Workspace level	Yes	No
Environment level	Yes, Pools	Yes, environments
Inline	Yes	Yes
Import/export	Yes	Yes
API/SDK support	Yes	Yes

Other considerations:
- Built-in libraries: Fabric and Azure Synapse share a common core of Spark, but they can slightly differ in different support of their runtime libraries. Typically, using code is compatible with some exceptions. In that case, users might need compilation, the addition of custom libraries, and adjusting syntax. See built-in Fabric Spark runtime libraries here.

Note

Learn how to migrate Azure Synapse Spark libraries to Fabric.

Notebook comparison

Notebooks and Spark job definitions are primary code items for developing Apache Spark jobs in Fabric. There are some differences between Azure Synapse Spark notebooks and Fabric Spark notebooks:

Notebook capability	Azure Synapse Spark	Fabric Spark
Import/export	Yes	Yes
Session configuration	Yes, UI and inline	Yes, UI (environment) and inline
IntelliSense	Yes	Yes
mssparkutils	Yes	Yes
Notebook resources	No	Yes
Collaborate	No	Yes
High concurrency	No	Yes
.NET for Spark C#	Yes	No
Pipeline activity support	Yes	Yes
Built-in scheduled run support	No	Yes
API/SDK support	Yes	Yes

mssparkutils: Because DMTS connections aren't supported in Fabric yet, only getToken and getSecret are supported for now in Fabric for mssparkutils.credentials.
Notebooks resources: Fabric notebooks provide a Unix-like file system to help you manage your folders and files. For more information, see How to use Microsoft Fabric notebooks.
Collaborate: The Fabric notebook is a collaborative item that supports multiple users editing the same notebook. For more information, see How to use Microsoft Fabric notebooks.
High concurrency: In Fabric, you can attach notebooks to a high concurrency session. This option is an alternative for users using ThreadPoolExecutor in Azure Synapse. For more information, see Configure high concurrency mode for Fabric notebooks.
.NET for Spark C#: Fabric doesn't support .NET Spark (C#). However, we recommendation that users with existing workloads written in C# or F# migrate to Python or Scala.
Built-in scheduled run support: Fabric supports scheduled runs for notebooks.
Other considerations:
- You can use features inside a notebook that are only supported in a specific version of Spark. Remember that Spark 2.4 and 3.1 aren't supported in Fabric.
- If your notebook or Spark job is using a linked service with different data source connections or mount points, you should modify your Spark jobs to use alternative methods for handling connections to external data sources and sinks. Use Spark code to connect to data sources using available Spark libraries.

Note

Learn how to Migrate notebooks from Azure Synapse to Fabric.

Spark job definition comparison

Important Spark job definition considerations:

Spark job capability	Azure Synapse Spark	Fabric Spark
PySpark	Yes	Yes
Scala	Yes	Yes
.NET for Spark C#	Yes	No
SparkR	No	Yes
Import/export	Yes (UI)	No
Pipeline activity support	Yes	Yes
Built-in scheduled run support	No	Yes
Retry policies	No	Yes
API/SDK support	Yes	Yes

Spark jobs: You can bring your .py/.R/jar files. Fabric supports SparkR. A Spark job definition supports reference files, command line arguments, Spark configurations, and lakehouse references.
Import/export: In Azure Synapse, you can import/export json-based Spark job definitions from the UI. This feature isn't available yet in Fabric.
.NET for Spark C#: Fabric doesn't support .NET Spark (C#). However, the recommendation is that users with existing workloads written in C# or F# migrate to Python or Scala.
Built-in scheduled run support: Fabric supports scheduled runs for a Spark job definition.
Retry policies: This option enables users to run Spark-structured streaming jobs indefinitely.

Note

Learn how to Migrate Spark job definitions from Azure Synapse to Fabric.

Hive Metastore (HMS) comparison

Hive MetaStore (HMS) differences and considerations:

HMS type	Azure Synapse Spark	Fabric Spark
Internal HMS	Yes	Yes (lakehouse)
External HMS	Yes	No

External HMS: Fabric currently doesn't support a Catalog API and access to an external Hive Metastore (HMS).

Note

Learn how to migrate Azure Synapse Spark catalog HMS metadata to Fabric.

Learn more about migration options for Spark pools, configurations, libraries, notebooks, and Spark job definitions
Migrate data and pipelines
Migrate Hive Metastore metadata

Deila með

Compare Fabric Data Engineering and Azure Synapse Spark

Spark pool comparison

Spark configurations comparison

Spark libraries comparison

Notebook comparison

Spark job definition comparison

Hive Metastore (HMS) comparison

Athugasemdir

Fleiri tilföng

Deila með

Compare Fabric Data Engineering and Azure Synapse Spark

Spark pool comparison

Spark configurations comparison

Spark libraries comparison

Notebook comparison

Spark job definition comparison

Hive Metastore (HMS) comparison

Related content

Athugasemdir

Fleiri tilföng