Редактиране

Споделяне чрез


Direct Lake overview

Direct Lake is a storage mode option for tables in a Power BI semantic model that's stored in a Microsoft Fabric workspace. It's optimized for large volumes of data that can be quickly loaded into memory from Delta tables, which store their data in Parquet files in OneLake—the single store for all analytics data. Once loaded into memory, the semantic model enables high performance queries. Direct Lake eliminates the slow and costly need to import data into the model.

You can use Direct Lake storage mode to connect to the tables or views of a single Fabric lakehouse or Fabric warehouse. Both of these Fabric items and Direct Lake semantic models require a Fabric capacity license.

Diagram shows a Direct Lake semantic model and how it connects to Delta tables in OneLake as described in the previous paragraphs.

In some ways, a Direct Lake semantic model is similar to an Import semantic model. That's because model data is loaded into memory by the VertiPaq engine for fast query performance (except in the case of DirectQuery fallback, which is explained later in this article).

However, a Direct Lake semantic model differs from an Import semantic model in an important way. That's because a refresh operation for a Direct Lake semantic model is conceptually different to a refresh operation for an Import semantic model. For a Direct Lake semantic model, a refresh involves a framing operation (described later in this article), which can take a few seconds to complete. It's a low-cost operation where the semantic model analyzes the metadata of the latest version of the Delta tables and is updated to reference the latest files in OneLake. In contrast, for an Import semantic model, a refresh produces a copy of the data, which can take considerable time and consume significant data source and capacity resources (memory and CPU).

Note

Incremental refresh for an Import semantic model can help to reduce refresh time and use of capacity resources.

When should you use Direct Lake storage mode?

The primary use case for a Direct Lake storage mode is typically for IT-driven analytics projects that leverage lake-centric architectures. In this scenario, you have—or expect to accumulate—large volumes of data in OneLake. The fast loading of that data into memory, frequent and fast refresh operations, efficient use of capacity resources, and fast query performance are all important for this use case.

Note

Import and DirectQuery semantic models are still relevant in Fabric, and they're the right choice of semantic model for some scenarios. For example, Import storage mode often works well for a self-service analyst who needs the freedom and agility to act quickly, and without dependency on IT to add new data elements.

Also, OneLake integration automatically writes data for tables in Import storage mode to Delta tables in OneLake without involving any migration effort. By using this option, you can realize many of the benefits of Fabric that are made available to Import semantic model users, such as integration with lakehouses through shortcuts, SQL queries, notebooks, and more. We recommend that you consider this option as a quick way to reap the benefits of Fabric without necessarily or immediately re-designing your existing data warehouse and/or analytics system.

Direct Lake storage mode is also suitable for minimizing data latency to quickly make data available to business users. If your Delta tables are modified intermittently (and assuming you've already done data preparation in the data lake), you can depend on automatic updates to reframe in response to those modifications. In this case, queries sent to the semantic model will return the latest data. This capability works well in partnership with the automatic page refresh feature of Power BI reports.

Keep in mind that Direct Lake depends on data preparation being done in the data lake. Data preparation can be done by using various tools, such as Spark jobs for Fabric lakehouses, T-SQL DML statements for Fabric warehouses, dataflows, pipelines, and others. This approach helps ensure data preparation logic is performed as low as possible in the architecture to maximize reusability. However, if the semantic model author doesn't have the ability to modify the source item, for example, in the case of a self-service analyst who might not have write permissions on a lakehouse that is managed by IT, then Import storage mode might be a better choice. That's because it supports data preparation by using Power Query, which is defined as part of semantic model.

Be sure to factor in your current Fabric capacity license and the Fabric capacity guardrails when you consider Direct Lake storage mode. Also, factor in the considerations and limitations, which are described later in this article.

Tip

We recommend that you produce a prototype—or proof of concept (POC)—to determine whether a Direct Lake semantic model is the right solution, and to mitigate risk.

How Direct Lake works

Typically, queries sent to a Direct Lake semantic model are handled from an in-memory cache of the columns sourced from Delta tables. The underlying storage for a Delta table is one or more Parquet files in OneLake. Parquet files organize data by columns rather than rows. Semantic models load entire columns from Delta tables into memory as they're required by queries.

A Direct Lake semantic model might also use DirectQuery fallback, which involves seamlessly switching to DirectQuery mode. DirectQuery fallback retrieves data directly from the SQL analytics endpoint of the lakehouse or the warehouse. For example, fallback might occur when a Delta table contains more rows of data than supported by your Fabric capacity (described later in this article). In this case, a DirectQuery operation sends a query to the SQL analytics endpoint. Fallback operations might result in slower query performance.

The following diagram shows how Direct Lake works by using the scenario of a user who opens a Power BI report.

Diagram shows how Direct Lake semantic models work. Concepts shown in the image are described in the following table.

The diagram depicts the following user actions, processes, and features.

Item Description
Item 1. OneLake is a data lake that stores analytics data in Parquet format. This file format is optimized for storing data for Direct Lake semantic models.
Item 2. A Fabric lakehouse or Fabric warehouse exists in a workspace that's on Fabric capacity. The lakehouse has a SQL analytics endpoint, which provides a SQL-based experience for querying. Tables (or views) provide a means to query the Delta tables in OneLake by using Transact-SQL (T-SQL).
Item 3. A Direct Lake semantic model exists in a Fabric workspace. It connects to tables or views in either the lakehouse or warehouse.
Item 4. A user opens a Power BI report.
Item 5. The Power BI report sends Data Analysis Expressions (DAX) queries to the Direct Lake semantic model.
Item 6. When possible (and necessary), the semantic model loads columns into memory directly from the Parquet files stored in OneLake. Queries achieve in-memory performance, which is very fast.
Item 7. The semantic model returns query results.
Item 8. The Power BI report renders the visuals.
Item 9. In certain circumstances, such as when the semantic model exceeds the guardrails of the capacity, semantic model queries automatically fall back to DirectQuery mode. In this mode, queries are sent to the SQL analytics endpoint of the lakehouse or warehouse.
Item 10. DirectQuery queries sent to the SQL analytics endpoint in turn query the Delta tables in OneLake. For this reason, query performance might be slower than in-memory queries.

The following sections describe Direct Lake concepts and features, including column loading, framing, automatic updates, and DirectQuery fallback.

Column loading (transcoding)

Direct Lake semantic models only load data from OneLake as and when columns are queried for the first time. The process of loading data on-demand from OneLake is known as transcoding.

When the semantic model receives a DAX (or Multidimensional Expressions—MDX) query, it first determines what columns are needed to produce a query result. Columns needed include any columns that are directly used by the query, and also columns required by relationships and measures. Typically, the number of columns needed to produce a query result is much smaller than the number of columns defined in the semantic model.

Once it's understood which columns are needed, the semantic model determines which columns are already in memory. If any columns needed for the query aren't in memory, the semantic model loads all data for those columns from OneLake. Loading column data is typically a very fast operation, however it can depend on factors such as the cardinality of data stored in the columns.

Columns loaded into memory are then resident in memory. Future queries that involve only resident columns don't need to load any more columns into memory.

A column remains resident until there's reason for it to be removed (evicted) from memory. Reasons that columns might get removed include:

  • The model or table has been refreshed (see Framing in the next section).
  • No query has used the column for some time.
  • Other memory management reasons, including memory pressure in the capacity due to other, concurrent operations.

Your choice of Fabric SKU determines the maximum available memory for each Direct Lake semantic model on the capacity. For more information about resource guardrails and maximum memory limits, see Fabric capacity guardrails and limitations later in this article.

Framing

Framing provides model owners with point-in-time control over what data is loaded into the semantic model. Framing is a Direct Lake operation that's triggered by a refresh of a semantic model, and in most cases takes only a few seconds to complete. That's because it's a low-cost operation where the semantic model analyzes the metadata of the latest version of the Delta Lake tables and is updated to reference the latest Parquet files in OneLake.

When framing occurs, resident columns might be evicted from memory and the point in time of the refresh becomes the new baseline for all future transcoding events. From this point, Direct Lake queries only consider data in the Delta tables as of the time of the most recent framing operation. For that reason, Direct Lake tables are queried to return data based on the state of the Delta table at the point of the most recent framing operation. That time isn't necessarily the latest state of the Delta tables.

The following diagram shows how Direct Lake framing operations work.

Diagram shows how Direct Lake framing operations work.

The diagram depicts the following processes and features.

Item Description
Item 1. A semantic model exists in a Fabric workspace.
Item 2. Framing operations take place periodically, and they set the baseline for all future transcoding events. Framing operations can happen automatically, manually, on schedule, or programmatically.
Item 3. OneLake stores metadata and Parquet files, which are represented as Delta tables.
Item 4. The last framing operation includes Parquet files related to the Delta tables, and specifically the Parquet files that were added before the last framing operation.
Item 5. A later framing operation includes Parquet files added after the last framing operation.
Item 6. Resident columns in the Direct Lake semantic model might be evicted from memory, and the point in time of the refresh becomes the new baseline for all future transcoding events.
Item 7. Subsequent data modifications, represented by new Parquet files, aren't visible until the next framing operation occurs.

It's not always desirable to have data representing the latest state of any Delta table when a transcoding operation takes place. Consider that framing can help you provide consistent query results in environments where data in Delta tables is transient. Data can be transient for several reasons, such as when long-running extract, transform, and load (ETL) processes occur.

Refresh for a Direct Lake semantic model can be done manually, automatically, or programmatically. For more information, see Refresh Direct Lake semantic models.

For more information about Delta table versioning and framing, see Understand storage for Direct Lake semantic models.

Automatic updates

There's a semantic model-level setting to automatically update Direct Lake tables. It's enabled by default. It ensures that data changes in OneLake are automatically reflected in the Direct Lake semantic model. You should disable automatic updates when you want to control data changes by framing, which was explained in the previous section. For more information, see Manage Direct Lake semantic models.

Tip

You can set up automatic page refresh in your Power BI reports. It's a feature that automatically refreshes a specific report page providing that the report connects to a Direct Lake semantic model (or other types of semantic model).

DirectQuery fallback

A query sent to a Direct Lake semantic model can fall back to DirectQuery mode. In this case, it retrieves data directly from the SQL analytics endpoint of the lakehouse or warehouse. Such queries always return the latest data because they're not constrained to the point in time of the last framing operation.

A query always falls back when the semantic model queries a view in the SQL analytics endpoint, or a table in the SQL analytics endpoint that enforces row-level security (RLS).

Also, a query might fall back when the semantic model exceeds the guardrails of the capacity.

Important

If possible, you should always design your solution—or size your capacity—to avoid DirectQuery fallback. That's because it might result in slower query performance.

You can control fallback of your Direct Lake semantic models by setting its DirectLakeBehavior property. For more information, see Set the Direct Lake behavior property.

Fabric capacity guardrails and limitations

Direct Lake semantic models require a Fabric capacity license. Also, there are capacity guardrails and limitations that apply to your Fabric capacity subscription (SKU), as presented in the following table.

Important

The first column in the following table also includes Power BI Premium capacity subscriptions (P SKUs). Be aware that Microsoft is consolidating purchase options and retiring the Power BI Premium per capacity SKUs. New and existing customers should consider purchasing Fabric capacity subscriptions (F SKUs) instead.

For more information, see Important update coming to Power BI Premium licensing and Power BI Premium.

Fabric SKU Parquet files per table Row groups per table Rows per table (millions) Max model size on disk/OneLake (GB) Max memory (GB) 1
F2 1,000 1,000 300 10 3
F4 1,000 1,000 300 10 3
F8 1,000 1,000 300 10 3
F16 1,000 1,000 300 20 5
F32 1,000 1,000 300 40 10
F64/FT1/P1 5,000 5,000 1,500 Unlimited 25
F128/P2 5,000 5,000 3,000 Unlimited 50
F256/P3 5,000 5,000 6,000 Unlimited 100
F512/P4 10,000 10,000 12,000 Unlimited 200
F1024/P5 10,000 10,000 24,000 Unlimited 400
F2048 10,000 10,000 24,000 Unlimited 400

1 For Direct Lake semantic models, Max Memory represents the upper memory resource limit for how much data can be paged in. For this reason, it's not a guardrail because exceeding it doesn't result in a fallback to DirectQuery mode; however, it can have a performance impact if the amount of data is large enough to cause excessive paging in and out of the model data from the OneLake data.

If exceeded, the Max model size on disk/OneLake will cause all queries to the semantic model to fall back to DirectQuery mode. All other guardrails presented in the table are evaluated per query. It's therefore important that you optimize your Delta tables and Direct Lake semantic model to avoid having to unnecessarily scale up to a higher Fabric SKU (resulting in increased cost).

Additionally, Capacity unit and Max memory per query limits apply to Direct Lake semantic models. For more information, see Capacities and SKUs.

Considerations and limitations

Direct Lake semantic models present some considerations and limitations.

Note

The capabilities and features of Direct Lake semantic models are evolving. Be sure to check back periodically to review the latest list of considerations and limitations.

  • When a Direct Lake semantic model table connects to a table in the SQL analytics endpoint that enforces row-level security (RLS), queries that involve that model table will always fall back to DirectQuery mode. Query performance might be slower.
  • When a Direct Lake semantic model table connects to a view in the SQL analytics endpoint, queries that involve that model table will always fall back to DirectQuery mode. Query performance might be slower.
  • Composite modeling isn't supported. That means Direct Lake semantic model tables can't be mixed with tables in other storage modes, such as Import, DirectQuery, or Dual (except for special cases, including calculation groups, what-if parameters, and field parameters).
  • Calculated columns and calculated tables that reference columns or tables in Direct Lake storage mode aren't supported. Calculation groups, what-if parameters, and field parameters, which implicitly create calculated tables, and calculated tables that do not reference Direct Lake columns or tables are supported.
  • Direct Lake storage mode tables don't support complex Delta table column types. Binary and GUID semantic types are also unsupported. You must convert these data types into strings or other supported data types.
  • Table relationships require the data types of related columns to match.
  • One-side columns of relationships must contain unique values. Queries will fail if duplicate values are detected in a one-side column.
  • Auto data/time intelligence in Power BI Desktop is not supported. Marking your own date table as a date table is supported.
  • The length of string column values is limited to 32,764 Unicode characters.
  • The floating point value NaN (not a number) isn't supported.
  • Embedding scenarios that use the For your customer usage scenario aren't supported.
  • Publish to web from Power BI is only supported when using a fixed identity for the Direct Lake semantic model.
  • In the web modeling experience, validation is limited for Direct Lake semantic models. User selections are assumed to be correct, and no queries are issued to validate cardinality or cross filter selections for relationships, or for the selected date column in a marked date table.
  • In the Fabric portal, the Direct Lake tab in the refresh history lists only Direct Lake-related refresh failures. Successful refresh (framing) operations aren't listed.
  • Your Fabric SKU determines the maximum available memory per Direct Lake semantic model for the capacity. When the limit is exceeded, queries to the semantic model might be slower due to excessive paging in and out of the model data.
  • Creating a Direct Lake semantic model in a workspace that is in a different region of the data source workspace is not supported. For example, if the Lakehouse is in West Central US, then you can only create semantic models from this Lakehouse in the same region. A workaround is to create a Lakehouse in the other region's workspace and shortcut to the tables before creating the semantic model. To find what region you are in, see find your Fabric home region.

Comparison to other storage modes

The following table compares Direct Lake storage mode to Import and DirectQuery storage modes.

Capability Direct Lake Import DirectQuery
Licensing Fabric capacity subscription (SKUs) only Any Fabric or Power BI license (including Microsoft Fabric Free licenses) Any Fabric or Power BI license (including Microsoft Fabric Free licenses)
Data source Only lakehouse or warehouse tables (or views) Any connector Any connector that supports DirectQuery mode
Connect to SQL analytics endpoint views Yes – but will automatically fall back to DirectQuery mode Yes Yes
Composite models No 1 Yes – can combine with DirectQuery or Dual storage mode tables Yes – can combine with Import or Dual storage mode tables
Single sign-on (SSO) Yes Not applicable Yes
Calculated tables No – except calculation groups, what-if parameters, and field parameters, which implicitly create calculated tables Yes No – calculated tables use Import storage mode even when they refer to other tables in DirectQuery mode
Calculated columns No Yes Yes
Hybrid tables No Yes Yes
Model table partitions No – however partitioning can be done at the Delta table level Yes – either automatically created by incremental refresh, or manually created by using the XMLA endpoint No
User-defined aggregations No Yes Yes
SQL analytics endpoint object-level security or column-level security Yes – but queries will fall back to DirectQuery mode and might produce errors when permission is denied Yes – but must duplicate permissions with semantic model object-level security Yes – but queries might produce errors when permission is denied
SQL analytics endpoint row-level security (RLS) Yes – but queries will fall back to DirectQuery mode Yes – but must duplicate permissions with semantic model RLS Yes
Semantic model row-level security (RLS) Yes – but it's strongly recommended to use a fixed identity cloud connection Yes Yes
Semantic model object-level security (OLS) Yes Yes Yes
Large data volumes without refresh requirement Yes Less suited – a larger capacity size might be required for querying and refreshing Yes
Reduce data latency Yes – when automatic updates is enabled, or programmatic reframing; however, data preparation must be done upstream first No Yes

1 You can't combine Direct Lake storage mode tables with DirectQuery or Dual storage mode tables in the same semantic model. However, you can use Power BI Desktop to create a composite model on a Direct Lake semantic model and then extend it with new tables (by using Import, DirectQuery, or Dual storage mode) or calculations. For more information, see Build a composite model on a semantic model.