Large datasets in Power BI Premium
Power BI datasets can store data in a highly compressed in-memory cache for optimized query performance, enabling fast user interactivity. With Premium capacities, large datasets beyond the default limit can be enabled with the Large dataset storage format setting. When enabled, dataset size is limited by the Premium capacity size or the maximum size set by the administrator.
Large datasets can be enabled for all Premium P SKUs, Embedded A SKUs, and with Premium Per User (PPU). The large dataset size limit in Premium is comparable to Azure Analysis Services, in terms of data model size limitations.
While required for datasets to grow beyond 10 GB, enabling the Large dataset storage format setting has other benefits. If you're planning to use XMLA endpoint-based tools for dataset write operations, be sure to enable the setting, even for datasets that you wouldn't necessarily characterize as a large dataset. When enabled, the large dataset storage format can improve XMLA write operations performance.
Large datasets in the service don't affect the Power BI Desktop model upload size, which is still limited to 10 GB. Instead, datasets can grow beyond that limit in the service on refresh.
Power BI Premium does support large datasets. Enable the Large dataset storage format option to use datasets in Power BI Premium that are larger than the default limit.
Large datasets in Power BI Premium are not available in the Power BI service for U.S. Government DoD customers. For more information about which features are available, and which are not, see Power BI feature availability for U.S. Government customers.
Enable large datasets
Steps here describe enabling large datasets for a new model published to the service. For existing datasets, only step 3 is necessary.
Create a model in Power BI Desktop. If your dataset will become larger and progressively consume more memory, be sure to configure Incremental refresh.
Publish the model as a dataset to the service.
In the service > dataset > Settings, expand Large dataset storage format, set the slider to On, and then select Apply.
Invoke a refresh to load historical data based on the incremental refresh policy. The first refresh could take a while to load the history. Subsequent refreshes should be faster, depending on your incremental refresh policy.
Set default storage format
In supported regions, all new datasets created in a workspace assigned to a Premium capacity can have the large dataset storage format enabled by default. If the region doesn't support large datasets, the large dataset storage format option described below is disabled. You can see which regions are supported in the region availability section.
In the workspace, select Settings > Premium.
In Default storage format, select Large dataset storage format, and then select Save.
Enable with PowerShell
You can also enable large dataset storage format by using PowerShell. You must have capacity admin and workspace admin privileges to run the PowerShell cmdlets.
Find the dataset ID (GUID). On the Datasets tab for the workspace, under the dataset settings, you can see the ID in the URL.
From a PowerShell admin prompt, install the MicrosoftPowerBIMgmt module.
Install-Module -Name MicrosoftPowerBIMgmt
Run the following cmdlets to sign in and check the dataset storage mode.
Login-PowerBIServiceAccount (Get-PowerBIDataset -Scope Organization -Id <Dataset ID> -Include actualStorage).ActualStorage
The response should be the following. The storage mode is ABF (Analysis Services backup file), which is the default.
Id StorageMode -- ----------- <Dataset ID> Abf
Run the following cmdlets to set the storage mode. It can take a few seconds to convert to Premium Files.
Set-PowerBIDataset -Id <Dataset ID> -TargetStorageMode PremiumFiles (Get-PowerBIDataset -Scope Organization -Id <Dataset ID> -Include actualStorage).ActualStorage
The response should be the following. The storage mode is now set to Premium Files.
Id StorageMode -- ----------- <Dataset ID> PremiumFiles
You can check the status of dataset conversions to and from Premium Files by using the Get-PowerBIWorkspaceMigrationStatus cmdlet.
Dataset eviction is a Premium feature that allows the sum of dataset sizes to be significantly greater than the memory available for the purchased SKU size of the capacity. A single dataset is still constrained to the memory limits of the SKU. Power BI uses dynamic memory management to evict inactive datasets from memory. Datasets are evicted so that Power BI can load other datasets to address user queries.
If you have to wait for an evicted dataset to be reloaded, you might experience a noticeable delay.
On-demand load is enabled by default for large datasets, and can provide significantly improved load time of evicted datasets. With on-demand load, you get the following benefits during subsequent queries and refreshes:
Relevant data pages are loaded on-demand (paged in to memory).
Evicted datasets are quickly made available for queries.
On-demand loading surfaces additional Dynamic Management View (DMV) information that can be used to identify usage patterns and understand the state of your models. For example, you can check the Temperature and Last Accessed statistics for each column in the dataset, by running the following DMV query from SQL Server Management Studio (SSMS):
Select * from SYSTEMRESTRICTSCHEMA ($System.DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS, [DATABASE_NAME] = '<Dataset Name>')
Checking dataset size
After loading historical data, you can use SSMS through the XMLA endpoint to check the estimated dataset size in the model properties window.
You can also check the dataset size by running the following DMV queries from SSMS. Sum the DICTIONARY_SIZE and USED_SIZE columns from the output to see the dataset size in bytes.
SELECT * FROM SYSTEMRESTRICTSCHEMA ($System.DISCOVER_STORAGE_TABLE_COLUMNS, [DATABASE_NAME] = '<Dataset Name>') //Sum DICTIONARY_SIZE (bytes) SELECT * FROM SYSTEMRESTRICTSCHEMA ($System.DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS, [DATABASE_NAME] = '<Dataset Name>') //Sum USED_SIZE (bytes)
Default segment size
For datasets using the large dataset storage format, Power BI automatically sets the default segment size to 8 million rows to strike a good balance between memory requirements and query performance for large tables. This is the same segment size as in Azure Analysis Services. Keeping the segment sizes aligned helps ensure comparable performance characteristics when migrating a large data model from Azure Analysis Services to Power BI.
Considerations and limitations
Keep in mind the following restrictions when using large datasets:
Download to Power BI Desktop: If a dataset is stored on Premium Files, downloading as a .pbix file will fail.
Supported regions: Large datasets are available in Azure regions that support Azure Premium Files Storage. Review the table in region availability to see a list of all the supported regions.
Setting maximum dataset size: Maximum dataset size can be set by administrators. For more information, see Max Memory in Datasets.
Refreshing large datasets: Datasets that are close to half the size of the capacity size (for example, a 12-GB dataset on a 25-GB capacity size) may exceed the available memory during refreshes. Using the enhanced refresh REST API or the XMLA endpoint, you can perform fine grained data refreshes, so that the memory needed by the refresh can be minimized to fit within your capacity's size.
Push datasets: Push datasets don't support the large dataset storage format.
Pro isn't supported - Large datasets aren't supported in Pro workspaces. If a workspace is migrated from Premium to Pro, any datasets with the large dataset storage format setting, will fail to load.
You can't use REST APIs to change the settings of a workspace to allow new datasets to use the large dataset storage format by default.
Large datasets in Power BI are only available in Azure regions that support Azure Premium Files Storage.
The following list provides regions where large datasets in Power BI are available. Regions not in the following list aren't supported for large models.
Once a large dataset is created in a workspace, it must stay in that region. You cannot reassign a workspace with a large dataset to a Premium capacity in another region.
|Azure region||Azure region abbreviation|
|East US 2||eastus2|
|Germany West Central||germanywestcentral|
|North Central US||northcentralus|
|South Africa North||southafricanorth|
|South Africa West||southafricawest|
|South Central US||southcentralus|
|West US 2||westus2|
The following links provide information that can be useful for working with large models:
Submit and view feedback for