Dela via


CREATE STATISTICS (Transact-SQL)

Creates query optimization statistics, including filtered statistics, on one or more columns of a table or indexed view. For most queries, the query optimizer already generates the necessary statistics for a high-quality query plan; in a few cases, you need to create additional statistics with CREATE STATISTICS or modify the query design to improve query performance.

Filtered statistics can improve query performance for queries that select from well-defined subsets of data. Filtered statistics use a filter predicate in the WHERE clause to select the subset of data that is included in the statistics.

For more information about statistics, including when to use CREATE STATISTICS, see Using Statistics to Improve Query Performance.

Topic link iconTransact-SQL Syntax Conventions

Syntax

CREATE STATISTICS statistics_name 
ON { table_or_indexed_view_name } (column [ ,...n ] ) 
    [ WHERE <filter_predicate> ]
    [ WITH 
        [ [ FULLSCAN 
          | SAMPLE number { PERCENT | ROWS } 
          | STATS_STREAM = stats_stream ] [ , ] ] 
        [ NORECOMPUTE ] 
    ] ;

<filter_predicate> ::= 
    <conjunct> [AND <conjunct>]

<conjunct> ::=
    <disjunct> | <comparison>

<disjunct> ::=
        column_name IN (constant ,…)

<comparison> ::=
        column_name <comparison_op> constant

<comparison_op> ::=
    IS | IS NOT | = | <> | != | > | >= | !> | < | <= | !<

Arguments

  • statistics_name
    Is the name of the statistics to create.

  • table_or_indexed_view_name
    Is the name of the table or indexed view to create the statistics on. Statistics can be created on tables or indexed views in another database by specifying a qualified table name.

  • column [ ,…n]
    Specifies the key column or list of key columns to create the statistics on. You can specify any column that can be specified as an index key column with the following exceptions:

    • Xml, full-text, and FILESTREAM columns cannot be specified.

    • Computed columns can be specified only if the ARITHABORT and QUOTED_IDENTIFIER database settings are ON.

    • CLR user-defined type columns can be specified if the type supports binary ordering. Computed columns defined as method invocations of a user-defined type column can be specified if the methods are marked deterministic. For more information about creating CLR user-defined type columns, see Working with CLR User-defined Types.

    The maximum allowable size of the combined column values is 900 bytes.

  • WHERE <filter_predicate>
    Specifies an expression for selecting a subset of rows to include when creating the statistics object. Statistics that are created with a filter predicate are called filtered statistics. The filter predicate uses simple comparison logic and cannot reference a computed column, a UDT column, a spatial data type column, or a hierarchyID data type column. Comparisons using NULL literals are not allowed with the comparison operators. Use the IS NULL and IS NOT NULL operators instead.

    Here are some examples of filter predicates for the Production.BillOfMaterials table:

    WHERE StartDate > '20000101' AND EndDate <= '20000630'

    WHERE ComponentID IN (533, 324, 753)

    WHERE StartDate IN ('20000404', '20000905') AND EndDate IS NOT NULL

    For more information about filter predicates, see Filtered Index Design Guidelines.

  • FULLSCAN
    Compute statistics by scanning all rows in the table or indexed view. FULLSCAN and SAMPLE 100 PERCENT have the same results. FULLSCAN cannot be used with the SAMPLE option.

  • SAMPLE number { PERCENT | ROWS }
    Specifies the approximate percentage or number of rows in the table or indexed view for the query optimizer to use when it creates statistics. For PERCENT, number can be from 0 through 100 and for ROWS, number can be from 0 to the total number of rows. The actual percentage or number of rows the query optimizer samples might not match the percentage or number specified. For example, the query optimizer scans all rows on a data page.

    SAMPLE is useful for special cases in which the query plan, based on default sampling, is not optimal. In most situations, it is not necessary to specify SAMPLE because the query optimizer already uses sampling and determines the statistically significant sample size by default, as required to create high-quality query plans.

    SAMPLE cannot be used with the FULLSCAN option. When neither SAMPLE nor FULLSCAN is specified, the query optimizer uses sampled data and computes the sample size by default.

    We recommend against specifying 0 PERCENT or 0 ROWS. When 0 PERCENT or ROWS is specified, the statistics object is created but does not contain statistics data.

  • NORECOMPUTE
    Disable the automatic statistics update option, AUTO_STATISTICS_UPDATE, for statistics_name. If this option is specified, the query optimizer will complete any in-progress statistics updates for statistics_name and disable future updates.

    To re-enable statistics updates, remove the statistics with DROP STATISTICS and then run CREATE STATISTICS without the NORECOMPUTE option.

    Warning

    Using this option can produce suboptimal query plans. We recommend using this option sparingly, and then only by a qualified system administrator.

    For more information about the AUTO_STATISTICS_UPDATE option, see ALTER DATABASE SET Options (Transact-SQL). For more information about disabling and re-enabling statistics updates, see Using Statistics to Improve Query Performance.

  • STATS_STREAM **=**stats_stream
    Identified for informational purposes only. Not supported. Future compatibility is not guaranteed.

Remarks

The following table gives maximum numbers for categories relating to statistics.

Category for statistics

Maximum

Columns per statistics object

32

Statistics created on indexes

1 per index and 1000 per table

Statistics created on columns

30,000 per table

When to Use CREATE STATISTICS

For more information about when to use CREATE STATISTICS, see Using Statistics to Improve Query Performance.

Referencing Dependencies for Filtered Statistics

The sys.sql_expression_dependencies catalog view tracks each column in the filtered statistics predicate as a referencing dependency. Consider the operations that you perform on table columns before creating filtered statistics because you cannot drop, rename, or alter the definition of a table column that is defined in a filtered statistics predicate.

Permissions

To create statistics, the user must be the table or indexed view owner, or the user must be a member of one of the following roles: sysadmin fixed server role, db_owner fixed database role, or the db_ddladmin fixed database role.

Examples

A. Using CREATE STATISTICS with SAMPLE number PERCENT

The following example creates the ContactMail1 statistics, using a random sample of 5 percent of the ContactID and EmailAddress columns of the Contact table of the AdventureWorks database.

USE AdventureWorks;
GO
CREATE STATISTICS ContactMail1
    ON Person.Contact (ContactID, EmailAddress)
    WITH SAMPLE 5 PERCENT;

B. Using CREATE STATISTICS with FULLSCAN and NORECOMPUTE

The following example creates the ContactMail2 statistics for all rows in the ContactID and EmailAddress columns of the Contact table and disables automatic recomputing of statistics.

CREATE STATISTICS NamePurchase
    ON AdventureWorks.Person.Contact (ContactID, EmailAddress)
    WITH FULLSCAN, NORECOMPUTE;

C. Using CREATE STATISTICS to create filtered statistics

The following example creates the filtered statistics ContactPromotion1. The Database Engine samples 50 percent of the data and then selects the rows with EmailPromotion equal to 2.

USE AdventureWorks;
GO
IF EXISTS (SELECT name FROM sys.stats
    WHERE name = N'ContactPromotion1'
    AND object_id = OBJECT_ID(N'Person.Contact'))
DROP STATISTICS Person.Contact.ContactPromotion1
GO
CREATE STATISTICS ContactPromotion1
    ON Person.Contact (ContactID, EmailAddress, EmailPromotion)
WHERE EmailPromotion = 2
WITH SAMPLE 50 PERCENT;
GO

Change History

Updated content

Revisions throughout the document to improve correctness.

Topic refers to new statistics content in the topic Using Statistics to Improve Query Performance.