What's new and planned for Synapse Data Engineering in Microsoft Fabric

Important

The release plans describe functionality that may or may not have been released yet. The delivery timelines and projected functionality may change or may not ship. Refer to Microsoft policy for more information.

Synapse Data Engineering empowers data engineers to be able to transform their data at scale using Spark and build out their lakehouse architecture.

Lakehouse for all your organizational data: The lakehouse combines the best of the data lake and the data warehouse in a single experience. It enables users to ingest, prepare, and share organizational data in an open format in the lake. Later you can access it through multiple engines such as Spark, T-SQL, and Power BI. It provides various data integration options such as dataflows and pipelines, shortcuts to external data sources, and data product sharing capabilities.

Performant Spark engine & runtime: Synapse Data engineering provides customers with an optimized Spark runtime with the latest versions of Spark, Delta, and Python.. It uses Delta Lake as the common table format for all engines, enabling easy data sharing and reporting with no data movement. The runtime comes with Spark optimizations, enhancing your query performance without any configurations. It also offers starter pools and high-concurrency mode to speed up and reuse your Spark sessions, saving you time and cost.

Spark Admin & configurations: Workspace admins with appropriate permissions can create and configure custom pools to optimize the performance and cost of their Spark workloads. Creators can configure environments to install libraries, select the runtime version, and set Spark properties for their notebooks and Spark jobs.

Developer Experience: Developers can use notebooks, Spark jobs, or their preferred IDE to author and execute Spark code in Fabric. They can natively access the lakehouse data, collaborate with others, install libraries, track history, do in-line monitoring, and get recommendations from the Spark advisor. They can also use Data Wrangler to easily prepare data with a low-code UI.

Platform Integration: All Synapse data engineering items, including notebooks, Spark jobs, environments,and lakehouses, are integrated deeply into the Fabric platform (enterprise information management capabilities, lineage, sensitivity labels, and endorsements).

Investment areas

Feature Estimated release timeline
High concurrency in pipelines Q3 2024
User Data Functions in Fabric Q3 2024
VSCode Core Extension for Fabric Q3 2024
VSCode Satellite Extension for User Data Functions in Fabric Q3 2024
VS Code for the Web - debugging support Q3 2024
Ability to sort and filter tables and folders in Lakehouse Q3 2024
Lakehouse data security Q4 2024
Public monitoring APIs Q4 2024
Schema support and workspace in namespace in Lakehouse Shipped (Q3 2024)
Spark Connector for Fabric Data Warehouse Shipped (Q2 2024)
Spark Native Execution Engine Shipped (Q2 2024)
Microsoft Fabric API for GraphQL Shipped (Q2 2024)
Create and attach environments Shipped (Q2 2024)
Job Queueing for Notebook Jobs Shipped (Q2 2024)
Optimistic Job Admission for Fabric Spark Shipped (Q2 2024)
Spark autotune Shipped (Q1 2024)

High concurrency in pipelines

Estimated release timeline: Q3 2024

Release Type: General availability

In addition to high concurrency in notebooks, we'll also enable high concurrency in pipelines. This capability will allow you to run multiple notebooks in a pipeline with a single session.

User Data Functions in Fabric

Estimated release timeline: Q3 2024

Release Type: Public preview

User Data Functions will provide a powerful mechanism for implementing and re-using custom, specialized business logic into Fabric data science and data engineering workflows, increasing efficiency and flexibility.

VSCode Core Extension for Fabric

Estimated release timeline: Q3 2024

Release Type: Public preview

Core VSCode Extension for Fabric will provide common developer support for Fabric services.

VSCode Satellite Extension for User Data Functions in Fabric

Estimated release timeline: Q3 2024

Release Type: Public preview

The VSCode Satellite extensionn for User Data Functions will provide developer support (editing, building, debugging, publishing) for User Data Functions in Fabric.

VS Code for the Web - debugging support

Estimated release timeline: Q3 2024

Release Type: Public preview

Visual Studio Code for the Web is currently supported in Preview for authoring and execution scenarios. We add to the list of capabilities the ability to debug code using this extension for notebook.

Ability to sort and filter tables and folders in Lakehouse

Estimated release timeline: Q3 2024

Release Type: General availability

This feature allows customers to sort and filter their tables and folders in the Lakehouse by several different methods, including alphabetically, created date, and more.

Lakehouse data security

Estimated release timeline: Q4 2024

Release Type: Public preview

You'll have the ability to apply file, folder, and table (or object level) security in the lakehouse. You can also control who can access data in the lakehouse, and the level of permissions they have. For example, You can grant read permissions on files, folders, and tables. Once permissions are applied, they're automatically synchronized across all engines. Which means, that permissions are consistent across Spark, SQL, Power BI, and external engines.

Public monitoring APIs

Estimated release timeline: Q4 2024

Release Type: General availability

The public monitoring APIs would allow you to programmatically retrieve the status of Spark jobs, job summaries, and the corresponding driver and executor logs.

Shipped feature(s)

Schema support and workspace in namespace in Lakehouse

Shipped (Q3 2024)

Release Type: Public preview

This will allow to organize tables using schemas and query data across workspaces.

Spark Connector for Fabric Data Warehouse

Shipped (Q2 2024)

Release Type: Public preview

Spark Connector for Fabric DW (Data Warehouse) empowers a Spark developer or a data scientist to access and work on data from Fabric Data Warehouse with a simplified Spark API, which literally works with just one line of code. It offers an ability to query the data, in parallel, from Fabric data warehouse so that it scales with increasing data volume and honors security model (OLS/RLS/CLS) defined at the data warehouse level while accessing the table or view. This first release will support reading data only and the support for writing data back will be coming soon.

Spark Native Execution Engine

Shipped (Q2 2024)

Release Type: Public preview

The native execution engine is a groundbreaking enhancement for Apache Spark job executions in Microsoft Fabric. This vectorized engine optimizes the performance and efficiency of your Spark queries by running them directly on your lakehouse infrastructure. The engine's seamless integration means it requires no code modifications and avoids vendor lock-in. It supports Apache Spark APIs and is compatible with Runtime 1.2 (Spark 3.4), and works with both Parquet and Delta formats. Regardless of your data's location within OneLake, or if you access data via shortcuts, the native execution engine maximizes efficiency and performance

Microsoft Fabric API for GraphQL

Shipped (Q2 2024)

Release Type: Public preview

API for GraphQL will allow Fabric data engineers, scientists, data solution architects to effortlessly expose and integrate Fabric data, for more responsive, performant and rich analytical applications, leveraging the power and flexibility of GraphQL.

Create and attach environments

Shipped (Q2 2024)

Release Type: General availability

To customize your Spark experiences at a more granular level, you can create and attach environments to your notebooks and Spark jobs. In an environment, you can install libraries, configure a new pool, set Spark properties, and upload scripts to a file system. This gives you more flexibility and control over your Spark workloads, without affecting the default settings of the workspace. As part of GA, we're making various improvements to environments including API support and CI/CD integration.

Job Queueing for Notebook Jobs

Shipped (Q2 2024)

Release Type: General availability

This feature allows scheduled Spark Notebooks to be queued when Spark usage is at its maximum number of jobs it can execute in parallel and then execute once usage has dropped back below the maximum number of parallel jobs allowed.

Optimistic Job Admission for Fabric Spark

Shipped (Q2 2024)

Release Type: General availability

With Optimistic Job Admission, Fabric Spark only reserves the minimum number of cores that a job needs to start, based on the minimum number of nodes that the job can scale down to. This allows more jobs to be admitted if there are enough resources to meet the minimum requirements. If a job needs to scale up later, the scale up requests is approved or rejected based on the available cores in capacity.

Spark autotune

Shipped (Q1 2024)

Release Type: Public preview

Autotune uses machine learning to automatically analyze previous runs of your Spark jobs and tunes the configurations to optimize the performance. It configures how your data is partitioned, joined, and read by Spark. This way it will significantly improve the performance. We have seen customer jobs run 2x faster with this capability.