Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest SQL, Fabric and Power BI learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Applies to:
SQL Server
Azure SQL Database
Azure SQL Managed Instance
Azure Synapse Analytics
SQL analytics endpoint in Microsoft Fabric
Warehouse in Microsoft Fabric
Creates query optimization statistics on one or more columns of a table, an indexed view, or an external table. For most queries, the query optimizer already generates the necessary statistics for a high-quality query plan; in a few cases, you need to create extra statistics with CREATE STATISTICS
or modify the query design to improve query performance.
To learn more, see Statistics.
Note
For more information on statistics in Microsoft Fabric, see Statistics in Fabric data warehousing.
Transact-SQL syntax conventions
Syntax for SQL Server, Azure SQL Database, and Azure SQL Managed Instance.
-- Create statistics on an external table
CREATE STATISTICS statistics_name
ON { table_or_indexed_view_name } ( column [ , ...n ] )
[ WITH FULLSCAN ] ;
-- Create statistics on a regular table or indexed view
CREATE STATISTICS statistics_name
ON { table_or_indexed_view_name } ( column [ , ...n ] )
[ WHERE <filter_predicate> ]
[ WITH
[ FULLSCAN
[ [ , ] PERSIST_SAMPLE_PERCENT = { ON | OFF } ]
| SAMPLE number { PERCENT | ROWS }
[ [ , ] PERSIST_SAMPLE_PERCENT = { ON | OFF } ]
| <update_stats_stream_option> [ , ...n ]
[ [ , ] NORECOMPUTE ]
[ [ , ] INCREMENTAL = { ON | OFF } ]
[ [ , ] MAXDOP = max_degree_of_parallelism ]
[ [ , ] AUTO_DROP = { ON | OFF } ]
]
];
<filter_predicate> ::=
<conjunct> [ AND <conjunct> ]
<conjunct> ::=
<disjunct> | <comparison>
<disjunct> ::=
column_name IN (constant , ...)
<comparison> ::=
column_name <comparison_op> constant
<comparison_op> ::=
IS | IS NOT | = | <> | != | > | >= | !> | < | <= | !<
<update_stats_stream_option> ::=
[ STATS_STREAM = stats_stream ]
[ ROWCOUNT = numeric_constant ]
[ PAGECOUNT = numeric_constant ]
Syntax for Azure Synapse Analytics and Analytics Platform System (PDW).
CREATE STATISTICS statistics_name
ON { database_name.schema_name.table_name | schema_name.table_name | table_name }
( column_name [ , ...n ] )
[ WHERE <filter_predicate> ]
[ WITH {
FULLSCAN
| SAMPLE number PERCENT
}
]
[ ; ]
<filter_predicate> ::=
<conjunct> [ AND <conjunct> ]
<conjunct> ::=
<disjunct> | <comparison>
<disjunct> ::=
column_name IN (constant , ...)
<comparison> ::=
column_name <comparison_op> constant
<comparison_op> ::=
IS | IS NOT | = | <> | != | > | >= | !> | < | <= | !<
Syntax for Microsoft Fabric.
CREATE STATISTICS statistics_name
ON { database_name.schema_name.table_name | schema_name.table_name | table_name }
( column_name )
[ WITH {
FULLSCAN
| SAMPLE number PERCENT
}
]
[ ; ]
The name of the statistics to create.
The name of the table, indexed view, or external table on which to create the statistics. To create statistics on another database, specify a qualified table name.
One or more columns to be included in the statistics. The columns should be in priority order from left to right. Only the first column is used for creating the histogram. All columns are used for cross-column correlation statistics called densities.
You can specify any column that can be specified as an index key column with the following exceptions:
xml, full-text, and FILESTREAM columns can't be specified.
Computed columns can be specified only if the ARITHABORT
and QUOTED_IDENTIFIER
database settings are ON
.
CLR user-defined type columns can be specified if the type supports binary ordering. Computed columns defined as method invocations of a user-defined type column can be specified if the methods are marked deterministic.
Specifies an expression for selecting a subset of rows to include when creating the statistics object. Statistics that are created with a filter predicate are called filtered statistics. The filter predicate uses simple comparison logic and can't reference a computed column, a UDT column, a spatial data type column, or a hierarchyID data type column. Comparisons using NULL
literals aren't allowed with the comparison operators. Use the IS NULL
and IS NOT NULL
operators instead.
Here are some examples of filter predicates for the Production.BillOfMaterials
table:
WHERE StartDate > '20000101' AND EndDate <= '20000630'
WHERE ComponentID IN (533, 324, 753)
WHERE StartDate IN ('20000404', '20000905') AND EndDate IS NOT NULL
For more information about filter predicates, see Create filtered indexes.
Applies to: SQL Server 2016 (13.x) SP 1 CU 4, SQL Server 2017 (14.x) CU 1, and later versions
Compute statistics by scanning all rows. FULLSCAN
and SAMPLE 100 PERCENT
have the same results. FULLSCAN
can't be used with the SAMPLE
option.
When omitted, SQL Server uses sampling to create the statistics, and determines the sample size that is required to create a high quality query plan.
In Warehouse in Microsoft Fabric, only single-column FULLSCAN
and single-column SAMPLE
-based statistics are supported. When no option is included, SAMPLE
statistics are created.
Specifies the approximate percentage, or number of rows, in the table or indexed view for the query optimizer to use when it creates statistics. For PERCENT
, number can be from 0 through 100 and for ROWS
, number can be from 0 to the total number of rows. The actual percentage or number of rows the query optimizer samples might not match the percentage or number specified. For example, the query optimizer scans all rows on a data page.
SAMPLE
is useful for special cases in which the query plan, based on default sampling, isn't optimal. In most situations, it's not necessary to specify SAMPLE
because the query optimizer already uses sampling and determines the statistically significant sample size by default, as required to create high-quality query plans.
SAMPLE
can't be used with the FULLSCAN option. When SAMPLE
or FULLSCAN
aren't specified, the query optimizer uses sampled data and computes the sample size by default.
We recommend against specifying 0 PERCENT
or 0 ROWS
. When 0 PERCENT
or 0 ROWS
is specified, the statistics object is created, but doesn't contain statistics data.
In Warehouse in Microsoft Fabric, only single-column FULLSCAN
and single-column SAMPLE
-based statistics are supported. When no option is included, FULLSCAN
statistics are created.
When ON
, the statistics retain the creation sampling percentage for subsequent updates that don't explicitly specify a sampling percentage. When OFF
, statistics sampling percentage gets reset to default sampling in subsequent updates that don't explicitly specify a sampling percentage. The default is OFF
.
Note
If the table is truncated, all statistics built on the truncated HoBT will revert to using the default sampling percentage.
Identified for informational purposes only. Not supported. Future compatibility is not guaranteed.
Disable the automatic statistics update option, AUTO_STATISTICS_UPDATE
, for statistics_name. If this option is specified, the query optimizer will complete any in-progress statistics updates for statistics_name and disable future updates.
To re-enable statistics updates, remove the statistics with DROP STATISTICS and then run CREATE STATISTICS
without the NORECOMPUTE
option.
Warning
If you disable automatic updating of statistics, it might prevent the Query Optimizer from picking optimal execution plans for queries that involve the table. You should use this option sparingly, and only by a qualified database administrator.
For more information about the AUTO_STATISTICS_UPDATE
option, see ALTER DATABASE SET options. For more information about disabling and re-enabling statistics updates, see Statistics.
Applies to: SQL Server 2014 (12.x) and later versions
When ON
, the statistics created are per partition statistics. When OFF
, stats are combined for all partitions. The default is OFF
.
If per partition statistics aren't supported, an error is generated. Incremental stats aren't supported for following statistics types:
Applies to: SQL Server 2016 (13.x) SP 2, SQL Server 2017 (14.x) CU 3, and later versions
Overrides the max degree of parallelism configuration option during the statistic operation. For more information, see Configure the max degree of parallelism (server configuration option). Use MAXDOP
to limit the number of processors used in a parallel plan execution. The maximum is 64 processors.
max_degree_of_parallelism can be:
1
: Suppresses parallel plan generation.>1
: Restricts the maximum number of processors used in a parallel index operation to the specified number.0
(default): Uses the actual number of processors or fewer based on the current system workload.Identified for informational purposes only. Not supported. Future compatibility is not guaranteed.
Applies to: SQL Server 2022 (16.x) and later versions, and Azure SQL Database, Azure SQL Managed Instance
Before SQL Server 2022 (16.x), if statistics are manually created by a user or third party tool on a user database, those statistics objects can block or interfere with schema changes the customer might desire.
Starting with SQL Server 2022 (16.x), the AUTO_DROP
option is enabled by default on all new and migrated databases. The AUTO_DROP
property allows the creation of statistics objects in a mode such that a subsequent schema change is not blocked by the statistic object, but instead the statistics are dropped as necessary. In this way, manually created statistics with AUTO_DROP
enabled behave like autocreated statistics.
Note
Trying to set or unset the Auto_Drop property on auto-created statistics might raise errors. Auto-created statistics always uses auto drop. Some backups, when restored, might have this property set incorrectly until the next time the statistics object is updated (manually or automatically). However, auto-created statistics always behave like auto drop statistics. When restoring a database to SQL Server 2022 (16.x) from a previous version, it's recommended to execute sp_updatestats
on the database, setting the proper metadata for the statistics AUTO_DROP
feature.
For more information, see AUTO_DROP option.
Requires one of these permissions:
ALTER TABLE
SQL Server can use tempdb
to sort the sampled rows before building statistics.
When creating external table statistics, SQL Server imports the external table into a temporary SQL Server table, and then creates the statistics. For samples statistics, only the sampled rows are imported. If you have a large external table, it's faster to use the default sampling instead of the full scan option.
When the external table is using DELIMITEDTEXT
, CSV
, PARQUET
, or DELTA
as data types, external tables only support statistics for one column per CREATE STATISTICS
command.
Filtered statistics can improve query performance for queries that select from well-defined subsets of data. Filtered statistics use a filter predicate in the WHERE clause to select the subset of data that is included in the statistics.
For more information about when to use CREATE STATISTICS
, see Statistics.
The sys.sql_expression_dependencies catalog view tracks each column in the filtered statistics predicate as a referencing dependency. Consider the operations that you perform on table columns, before creating filtered statistics. You can't drop, rename, or alter the definition of a table column that is defined in a filtered statistics predicate.
MAXDOP
option isn't compatible with STATS_STREAM
, ROWCOUNT
, and PAGECOUNT
options.MAXDOP
option is limited by the Resource Governor workload group MAX_DOP
setting, if used.CREATE
and DROP STATISTICS
on external tables aren't supported in Azure SQL Database.The code samples in this article use the AdventureWorks2022
or AdventureWorksDW2022
sample database, which you can download from the Microsoft SQL Server Samples and Community Projects home page.
The following example creates the ContactMail1
statistics, using a random sample of 5 percent of the BusinessEntityID
and EmailPromotion
columns of the Person
table of the AdventureWorks2022 database.
CREATE STATISTICS ContactMail1
ON Person.Person (BusinessEntityID, EmailPromotion)
WITH SAMPLE 5 PERCENT;
The following example creates the NamePurchase
statistics for all rows in the BusinessEntityID
and EmailPromotion
columns of the Person
table and disables automatic recomputing of statistics.
CREATE STATISTICS NamePurchase
ON AdventureWorks2022.Person.Person (BusinessEntityID, EmailPromotion)
WITH FULLSCAN, NORECOMPUTE;
The following example creates the filtered statistics ContactPromotion1
. The Database Engine samples 50 percent of the data and then selects the rows with EmailPromotion
equal to 2.
CREATE STATISTICS ContactPromotion1
ON Person.Person (BusinessEntityID, LastName, EmailPromotion)
WHERE EmailPromotion = 2
WITH SAMPLE 50 PERCENT;
GO
The only decision you need to make when you create statistics on an external table, besides providing the list of columns, is whether to create the statistics by sampling the rows or by scanning all of the rows. CREATE
and DROP STATISTICS
on external tables aren't supported in Azure SQL Database.
Since SQL Server imports data from the external table into a temporary table to create statistics, the full scan option takes much longer. For a large table, the default sampling method is usually sufficient.
--Create statistics on an external table and use default sampling.
CREATE STATISTICS CustomerStats1 ON DimCustomer (CustomerKey, EmailAddress);
--Create statistics on an external table and scan all the rows
CREATE STATISTICS CustomerStats1 ON DimCustomer (CustomerKey, EmailAddress) WITH FULLSCAN;
The following example creates the NamePurchase
statistics for all rows in the BusinessEntityID
and EmailPromotion
columns of the Person
table and sets a 100 percent sampling percentage for all subsequent updates that don't explicitly specify a sampling percentage.
CREATE STATISTICS NamePurchase
ON AdventureWorks2022.Person.Person (BusinessEntityID, EmailPromotion)
WITH FULLSCAN, PERSIST_SAMPLE_PERCENT = ON;
The following example creates the CustomerStats1
statistics, based on the CustomerKey
and EmailAddress
columns of the DimCustomer
table. The statistics are created based on a statistically significant sampling of the rows in the Customer
table.
CREATE STATISTICS CustomerStats1 ON DimCustomer (CustomerKey, EmailAddress);
The following example creates the CustomerStatsFullScan
statistics, based on scanning all of the rows in the DimCustomer
table.
CREATE STATISTICS CustomerStatsFullScan
ON DimCustomer (CustomerKey, EmailAddress) WITH FULLSCAN;
The following example creates the CustomerStatsSampleScan
statistics, based on scanning 50 percent of the rows in the DimCustomer
table.
CREATE STATISTICS CustomerStatsSampleScan
ON DimCustomer (CustomerKey, EmailAddress) WITH SAMPLE 50 PERCENT;
To use auto drop statistics, just add the following to the "WITH" clause of statistics create or update.
CREATE STATISTICS CustomerStats1 ON DimCustomer (CustomerKey, EmailAddress) WITH AUTO_DROP = ON
To evaluate the auto drop setting on existing statistics, use the auto_drop
column in sys.stats:
SELECT object_id, [name], auto_drop
FROM sys.stats;
Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest SQL, Fabric and Power BI learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register today