What's new and planned for Synapse Data Engineering in Microsoft Fabric
Important
The release plans describe functionality that may or may not have been released yet. The delivery timelines and projected functionality may change or may not ship. Refer to Microsoft policy for more information.
Synapse Data Engineering empowers data engineers to be able to transform their data at scale using Spark and build out their lakehouse architecture.
Lakehouse for all your organizational data: The lakehouse combines the best of the data lake and the data warehouse in a single experience. It enables users to ingest, prepare, and share organizational data in an open format in the lake. Later you can access it through multiple engines such as Spark, T-SQL, and Power BI. It provides various data integration options such as dataflows and pipelines, shortcuts to external data sources, and data product sharing capabilities.
Performant Spark engine & runtime: Synapse Data engineering provides customers with an optimized Spark runtime with the latest versions of Spark, Delta, and Python.. It uses Delta Lake as the common table format for all engines, enabling easy data sharing and reporting with no data movement. The runtime comes with Spark optimizations, enhancing your query performance without any configurations. It also offers starter pools and high-concurrency mode to speed up and reuse your Spark sessions, saving you time and cost.
Spark Admin & configurations: Workspace admins with appropriate permissions can create and configure custom pools to optimize the performance and cost of their Spark workloads. Creators can configure environments to install libraries, select the runtime version, and set Spark properties for their notebooks and Spark jobs.
Developer Experience: Developers can use notebooks, Spark jobs, or their preferred IDE to author and execute Spark code in Fabric. They can natively access the lakehouse data, collaborate with others, install libraries, track history, do in-line monitoring, and get recommendations from the Spark advisor. They can also use Data Wrangler to easily prepare data with a low-code UI.
Platform Integration: All Synapse data engineering items, including notebooks, Spark jobs, environments,and lakehouses, are integrated deeply into the Fabric platform (enterprise information management capabilities, lineage, sensitivity labels, and endorsements).
Investment areas
Feature | Estimated release timeline |
---|---|
High concurrency in pipelines | Q3 2024 |
User Data Functions in Fabric | Q3 2024 |
VSCode Core Extension for Fabric | Q3 2024 |
VSCode Satellite Extension for User Data Functions in Fabric | Q3 2024 |
VS Code for the Web - debugging support | Q3 2024 |
Ability to sort and filter tables and folders in Lakehouse | Q3 2024 |
Lakehouse data security | Q4 2024 |
Public monitoring APIs | Q4 2024 |
Schema support and workspace in namespace in Lakehouse | Shipped (Q3 2024) |
Spark Connector for Fabric Data Warehouse | Shipped (Q2 2024) |
Spark Native Execution Engine | Shipped (Q2 2024) |
Microsoft Fabric API for GraphQL | Shipped (Q2 2024) |
Create and attach environments | Shipped (Q2 2024) |
Job Queueing for Notebook Jobs | Shipped (Q2 2024) |
Optimistic Job Admission for Fabric Spark | Shipped (Q2 2024) |
Spark autotune | Shipped (Q1 2024) |
High concurrency in pipelines
Estimated release timeline: Q3 2024
Release Type: General availability
In addition to high concurrency in notebooks, we'll also enable high concurrency in pipelines. This capability will allow you to run multiple notebooks in a pipeline with a single session.
User Data Functions in Fabric
Estimated release timeline: Q3 2024
Release Type: Public preview
User Data Functions will provide a powerful mechanism for implementing and re-using custom, specialized business logic into Fabric data science and data engineering workflows, increasing efficiency and flexibility.
VSCode Core Extension for Fabric
Estimated release timeline: Q3 2024
Release Type: Public preview
Core VSCode Extension for Fabric will provide common developer support for Fabric services.
VSCode Satellite Extension for User Data Functions in Fabric
Estimated release timeline: Q3 2024
Release Type: Public preview
The VSCode Satellite extensionn for User Data Functions will provide developer support (editing, building, debugging, publishing) for User Data Functions in Fabric.
VS Code for the Web - debugging support
Estimated release timeline: Q3 2024
Release Type: Public preview
Visual Studio Code for the Web is currently supported in Preview for authoring and execution scenarios. We add to the list of capabilities the ability to debug code using this extension for notebook.
Ability to sort and filter tables and folders in Lakehouse
Estimated release timeline: Q3 2024
Release Type: General availability
This feature allows customers to sort and filter their tables and folders in the Lakehouse by several different methods, including alphabetically, created date, and more.
Lakehouse data security
Estimated release timeline: Q4 2024
Release Type: Public preview
You'll have the ability to apply file, folder, and table (or object level) security in the lakehouse. You can also control who can access data in the lakehouse, and the level of permissions they have. For example, You can grant read permissions on files, folders, and tables. Once permissions are applied, they're automatically synchronized across all engines. Which means, that permissions are consistent across Spark, SQL, Power BI, and external engines.
Public monitoring APIs
Estimated release timeline: Q4 2024
Release Type: General availability
The public monitoring APIs would allow you to programmatically retrieve the status of Spark jobs, job summaries, and the corresponding driver and executor logs.
Shipped feature(s)
Schema support and workspace in namespace in Lakehouse
Shipped (Q3 2024)
Release Type: Public preview
This will allow to organize tables using schemas and query data across workspaces.
Spark Connector for Fabric Data Warehouse
Shipped (Q2 2024)
Release Type: Public preview
Spark Connector for Fabric DW (Data Warehouse) empowers a Spark developer or a data scientist to access and work on data from Fabric Data Warehouse with a simplified Spark API, which literally works with just one line of code. It offers an ability to query the data, in parallel, from Fabric data warehouse so that it scales with increasing data volume and honors security model (OLS/RLS/CLS) defined at the data warehouse level while accessing the table or view. This first release will support reading data only and the support for writing data back will be coming soon.
Spark Native Execution Engine
Shipped (Q2 2024)
Release Type: Public preview
The native execution engine is a groundbreaking enhancement for Apache Spark job executions in Microsoft Fabric. This vectorized engine optimizes the performance and efficiency of your Spark queries by running them directly on your lakehouse infrastructure. The engine's seamless integration means it requires no code modifications and avoids vendor lock-in. It supports Apache Spark APIs and is compatible with Runtime 1.2 (Spark 3.4), and works with both Parquet and Delta formats. Regardless of your data's location within OneLake, or if you access data via shortcuts, the native execution engine maximizes efficiency and performance
Microsoft Fabric API for GraphQL
Shipped (Q2 2024)
Release Type: Public preview
API for GraphQL will allow Fabric data engineers, scientists, data solution architects to effortlessly expose and integrate Fabric data, for more responsive, performant and rich analytical applications, leveraging the power and flexibility of GraphQL.
Create and attach environments
Shipped (Q2 2024)
Release Type: General availability
To customize your Spark experiences at a more granular level, you can create and attach environments to your notebooks and Spark jobs. In an environment, you can install libraries, configure a new pool, set Spark properties, and upload scripts to a file system. This gives you more flexibility and control over your Spark workloads, without affecting the default settings of the workspace. As part of GA, we're making various improvements to environments including API support and CI/CD integration.
Job Queueing for Notebook Jobs
Shipped (Q2 2024)
Release Type: General availability
This feature allows scheduled Spark Notebooks to be queued when Spark usage is at its maximum number of jobs it can execute in parallel and then execute once usage has dropped back below the maximum number of parallel jobs allowed.
Optimistic Job Admission for Fabric Spark
Shipped (Q2 2024)
Release Type: General availability
With Optimistic Job Admission, Fabric Spark only reserves the minimum number of cores that a job needs to start, based on the minimum number of nodes that the job can scale down to. This allows more jobs to be admitted if there are enough resources to meet the minimum requirements. If a job needs to scale up later, the scale up requests is approved or rejected based on the available cores in capacity.
Spark autotune
Shipped (Q1 2024)
Release Type: Public preview
Autotune uses machine learning to automatically analyze previous runs of your Spark jobs and tunes the configurations to optimize the performance. It configures how your data is partitioned, joined, and read by Spark. This way it will significantly improve the performance. We have seen customer jobs run 2x faster with this capability.