Читати англійською Редагувати

Поділитися через


Scalar UDF inlining

Applies to: SQL Server 2019 (15.x) Azure SQL Database Azure SQL Managed Instance

This article introduces scalar UDF inlining, a feature under the Intelligent query processing in SQL databases suite of features. This feature improves the performance of queries that invoke scalar UDFs in SQL Server 2019 (15.x) and later versions.

T-SQL scalar user-defined functions

User-Defined Functions (UDFs) that are implemented in Transact-SQL and return a single data value are referred to as T-SQL Scalar User-Defined Functions. T-SQL UDFs are an elegant way to achieve code reuse and modularity across Transact-SQL queries. Some computations (such as complex business rules) are easier to express in imperative UDF form. UDFs help in building up complex logic without requiring expertise in writing complex SQL queries. For more information about UDFs, see Create user-defined functions (Database Engine).

Performance of scalar UDFs

Scalar UDFs typically end up performing poorly due to the following reasons:

  • Iterative invocation. UDFs are invoked in an iterative manner, once per qualifying tuple. This incurs an extra cost of repeated context switching due to function invocation. Especially, UDFs that execute Transact-SQL queries in their definition are severely affected.

  • Lack of costing. During optimization, only relational operators are costed, while scalar operators aren't. Before the introduction of scalar UDFs, other scalar operators were generally cheap and didn't require costing. A small CPU cost added for a scalar operation was enough. There are scenarios where the actual cost is significant, and yet still remains underrepresented.

  • Interpreted execution. UDFs are evaluated as a batch of statements, executed statement-by-statement. Each statement itself is compiled, and the compiled plan is cached. Although this caching strategy saves some time as it avoids recompilations, each statement executes in isolation. No cross-statement optimizations are carried out.

  • Serial execution. SQL Server doesn't allow intra-query parallelism in queries that invoke UDFs.

Automatic inlining of scalar UDFs

The goal of the scalar UDF inlining feature is to improve performance of queries that invoke T-SQL scalar UDFs, where UDF execution is the main bottleneck.

With this new feature, scalar UDFs are automatically transformed into scalar expressions or scalar subqueries that are substituted in the calling query in place of the UDF operator. These expressions and subqueries are then optimized. As a result, the query plan no longer has a user-defined function operator, but its effects are observed in the plan, like views or inline table-valued functions (TVFs).

Examples

The examples in this section use the TPC-H benchmark database. For more information, see the TPC-H Homepage.

A. Single statement scalar UDF

Consider the following query.

SELECT L_SHIPDATE,
       O_SHIPPRIORITY,
       SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT))
FROM LINEITEM
     INNER JOIN ORDERS
         ON O_ORDERKEY = L_ORDERKEY
GROUP BY L_SHIPDATE, O_SHIPPRIORITY
ORDER BY L_SHIPDATE;

This query computes the sum of discounted prices for line items and presents the results grouped by the shipping date and shipping priority. The expression L_EXTENDEDPRICE *(1 - L_DISCOUNT) is the formula for the discounted price for a given line item. Such formulas can be extracted into functions for the benefit of modularity and reuse.

CREATE FUNCTION dbo.discount_price
(
    @price DECIMAL (12, 2),
    @discount DECIMAL (12, 2)
)
RETURNS DECIMAL (12, 2)
AS
BEGIN
    RETURN @price * (1 - @discount);
END

Now the query can be modified to invoke this UDF.

SELECT L_SHIPDATE,
       O_SHIPPRIORITY,
       SUM(dbo.discount_price(L_EXTENDEDPRICE, L_DISCOUNT))
FROM LINEITEM
     INNER JOIN ORDERS
         ON O_ORDERKEY = L_ORDERKEY
GROUP BY L_SHIPDATE, O_SHIPPRIORITY
ORDER BY L_SHIPDATE;

The query with the UDF performs poorly, due to the reasons outlined previously. With scalar UDF inlining, the scalar expression in the body of the UDF is substituted directly in the query. The results of running this query are shown in the following table:

Query: Query without UDF Query with UDF (without inlining) Query with scalar UDF inlining
Execution time: 1.6 seconds 29 minutes 11 seconds 1.6 seconds

These numbers are based on a 10-GB CCI database (using the TPC-H schema), running on a machine with dual processor (12 core), 96-GB RAM, backed by SSD. The numbers include compilation and execution time with a cold procedure cache and buffer pool. The default configuration was used, and no other indexes were created.

B. Multi-statement scalar UDF

Scalar UDFs that are implemented using multiple T-SQL statements such as variable assignments and conditional branching can also be inlined. Consider the following scalar UDF that, given a customer key, determines the service category for that customer. It arrives at the category by first computing the total price of all orders placed by the customer using a SQL query. Then, it uses an IF (...) ELSE logic to decide the category based on the total price.

CREATE OR ALTER FUNCTION dbo.customer_category (@ckey INT)
RETURNS CHAR (10)
AS
BEGIN
    DECLARE @total_price AS DECIMAL (18, 2);
    DECLARE @category AS CHAR (10);
    SELECT @total_price = SUM(O_TOTALPRICE)
    FROM ORDERS
    WHERE O_CUSTKEY = @ckey;
    IF @total_price < 500000
        SET @category = 'REGULAR';
    ELSE
        IF @total_price < 1000000
            SET @category = 'GOLD';
        ELSE
            SET @category = 'PLATINUM';
    RETURN @category;
END

Now, consider a query that invokes this UDF.

SELECT C_NAME,
       dbo.customer_category(C_CUSTKEY)
FROM CUSTOMER;

The execution plan for this query in SQL Server 2017 (14.x) (compatibility level 140 and earlier) is as follows:

Screenshot of Query Plan without inlining.

As the plan shows, SQL Server adopts a simple strategy here: for every tuple in the CUSTOMER table, invoke the UDF and output the results. This strategy is naive and inefficient. With inlining, such UDFs are transformed into equivalent scalar subqueries, which are substituted in the calling query in place of the UDF.

For the same query, the plan with the UDF inlined looks as follows.

Screenshot of Query Plan with inlining.

As mentioned earlier, the query plan no longer has a user-defined function operator, but its effects are now observable in the plan, like views or inline TVFs. Here are some key observations from the previous plan:

  • SQL Server infers the implicit join between CUSTOMER and ORDERS and makes it explicit via a join operator.

  • SQL Server also infers the implicit GROUP BY O_CUSTKEY on ORDERS and uses the IndexSpool + StreamAggregate to implement it.

  • SQL Server is now using parallelism across all operators.

Depending upon the complexity of the logic in the UDF, the resulting query plan might also get bigger and more complex. As we can see, the operations inside the UDF are now no longer opaque, and so the query optimizer is able to cost and optimize those operations. Also, since the UDF is no longer in the plan, iterative UDF invocation is replaced by a plan that completely avoids function call overhead.

Inlineable scalar UDF requirements

A scalar T-SQL UDF can be inlined if the function definition uses allowed constructs, and the function is used in a context that enables inlining:

All of the following conditions of the UDF definition must be true:

  • The UDF is written using the following constructs:
    • DECLARE, SET: Variable declaration and assignments.
    • SELECT: SQL query with single/multiple variable assignments 1.
    • IF/ELSE: Branching with arbitrary levels of nesting.
    • RETURN: Single or multiple return statements. Starting with SQL Server 2019 (15.x) CU5, the UDF can only contain a single RETURN statement to be considered for inlining 6.
    • UDF: Nested/recursive function calls 2.
    • Others: Relational operations such as EXISTS, IS NULL.
  • The UDF doesn't invoke any intrinsic function that is either time-dependent (such as GETDATE()) or has side effects 3 (such as NEWSEQUENTIALID()).
  • The UDF uses the EXECUTE AS CALLER clause (default behavior if the EXECUTE AS clause isn't specified).
  • The UDF doesn't reference table variables or table-valued parameters.
  • The UDF isn't natively compiled (interop is supported).
  • The UDF doesn't reference user-defined types.
  • There are no signatures added to the UDF 9.
  • The UDF isn't a partition function.
  • The UDF doesn't contain references to Common Table Expressions (CTEs).
  • The UDF doesn't contain references to intrinsic functions that might alter the results when inlined (such as @@ROWCOUNT) 4.
  • The UDF doesn't contain aggregate functions being passed as parameters to a scalar UDF 4.
  • The UDF doesn't reference built-in views (such as OBJECT_ID) 4.
  • The UDF doesn't reference XML methods 5.
  • The UDF doesn't contain a SELECT with ORDER BY without a TOP 1 clause 5.
  • The UDF doesn't contain a SELECT query that performs an assignment with the ORDER BY clause (such as SELECT @x = @x + 1 FROM table1 ORDER BY col1) 5.
  • The UDF doesn't contain multiple RETURN statements 6.
  • The UDF doesn't reference the STRING_AGG function 6.
  • The UDF doesn't reference remote tables 7.
  • The UDF doesn't reference encrypted columns 8.
  • The UDF doesn't contain references to WITH XMLNAMESPACES 8.
  • If the UDF definition runs into thousands of lines of code, SQL Server might choose not to inline it.

1 SELECT with variable accumulation/aggregation isn't supported for inlining (such as SELECT @val += col1 FROM table1).

2 Recursive UDFs are inlined to a certain depth only.

3 Intrinsic functions whose results depend upon the current system time are time-dependent. An intrinsic function that might update some internal global state is an example of a function with side effects. Such functions return different results each time they're called, based on the internal state.

4 Restriction added in SQL Server 2019 (15.x) CU 2

5 Restriction added in SQL Server 2019 (15.x) CU 4

6 Restriction added in SQL Server 2019 (15.x) CU 5

7 Restriction added in SQL Server 2019 (15.x) CU 6

8 Restriction added in SQL Server 2019 (15.x) CU 11

9 Because signatures could be added and dropped after a UDF is created, the decision whether to inline is done when the query referencing a scalar UDF is compiled. For example, system functions are typically signed with a certificate. You can use sys.crypt_properties to find which objects are signed.

All of the following requirement of the execution context must be true:

  • The UDF isn't used in ORDER BY clause.
  • The query invoking a scalar UDF doesn't reference a scalar UDF call in its GROUP BY clause.
  • The query invoking a scalar UDF in its select list with DISTINCT clause doesn't have an ORDER BY clause.
  • The UDF isn't called from a RETURN statement 1.
  • The query invoking the UDF doesn't have common table expressions (CTEs) 3.
  • The UDF-calling query doesn't use GROUPING SETS, CUBE, or ROLLUP 2.
  • The UDF-calling query doesn't contain a variable that is used as a UDF parameter for assignment (for example, SELECT @y = 2, @x = UDF(@y)) 2.
  • The UDF isn't used in a computed column or a check constraint definition.

1 Restriction added in SQL Server 2019 (15.x) CU 5

2 Restriction added in SQL Server 2019 (15.x) CU 6

3 Restriction added in SQL Server 2019 (15.x) CU 11

For information on the latest T-SQL scalar UDF inlining fixes and changes to inlining eligibility scenarios, see the Knowledge Base article: FIX: scalar UDF inlining issues in SQL Server 2019.

Check whether a UDF can be inlined

For every T-SQL scalar UDF, the sys.sql_modules catalog view includes a property called is_inlineable, which indicates whether a UDF is inlineable.

The is_inlineable property is derived from the constructs found inside the UDF definition. It doesn't check whether the UDF is in fact inlineable at compile time. For more information, see the conditions for inlining.

A value of 1 indicates that the UDF is inlineable, and 0 indicates otherwise. This property has a value of 1 for all inline TVFs as well. For all other modules, the value is 0.

If a scalar UDF is inlineable, it doesn't imply that it's always inlined. SQL Server decides (on a per-query, per-UDF basis) whether to inline a UDF. Refer to the lists of requirements earlier in this article.

SELECT *
FROM sys.crypt_properties AS cp
     INNER JOIN sys.objects AS o
         ON cp.major_id = o.object_id;

Check whether inlining has happened

If all the preconditions are satisfied and SQL Server decides to perform inlining, it transforms the UDF into a relational expression. From the query plan, you can figure out whether inlining occurred:

  • The plan XML doesn't have a <UserDefinedFunction> XML node for a UDF that is inlined successfully.
  • Certain Extended Events are emitted.

Enable scalar UDF inlining

You can make workloads automatically eligible for scalar UDF inlining by enabling compatibility level 150 for the database. You can set this using Transact-SQL. For example:

ALTER DATABASE [WideWorldImportersDW]
    SET COMPATIBILITY_LEVEL = 150;

Apart from this step, there are no other changes required to be made to UDFs or queries to take advantage of this feature.

Disable scalar UDF inlining without changing the compatibility level

Scalar UDF inlining can be disabled at the database, statement, or UDF scope while still maintaining database compatibility level 150 and higher. To disable scalar UDF inlining at the database scope, execute the following statement within the context of the applicable database:

ALTER DATABASE SCOPED CONFIGURATION SET TSQL_SCALAR_UDF_INLINING = OFF;

To re-enable scalar UDF inlining for the database, execute the following statement within the context of the applicable database:

ALTER DATABASE SCOPED CONFIGURATION SET TSQL_SCALAR_UDF_INLINING = ON;

When ON, this setting appears as enabled in sys.database_scoped_configurations.

You can also disable scalar UDF inlining for a specific query by designating DISABLE_TSQL_SCALAR_UDF_INLINING as a USE HINT query hint.

A USE HINT query hint takes precedence over the database scoped configuration or compatibility level setting.

For example:

SELECT L_SHIPDATE,
       O_SHIPPRIORITY,
       SUM(dbo.discount_price(L_EXTENDEDPRICE, L_DISCOUNT))
FROM LINEITEM
     INNER JOIN ORDERS
         ON O_ORDERKEY = L_ORDERKEY
GROUP BY L_SHIPDATE, O_SHIPPRIORITY
ORDER BY L_SHIPDATE
OPTION (USE HINT('DISABLE_TSQL_SCALAR_UDF_INLINING'));

Scalar UDF inlining can also be disabled for a specific UDF using the INLINE clause in the CREATE FUNCTION or ALTER FUNCTION statement. For example:

CREATE OR ALTER FUNCTION dbo.discount_price
(
    @price DECIMAL (12, 2),
    @discount DECIMAL (12, 2)
)
RETURNS DECIMAL (12, 2)
WITH INLINE = OFF
AS
BEGIN
    RETURN @price * (1 - @discount);
END

Once the previous statement is executed, this UDF is never inlined into any query that invokes it. To re-enable inlining for this UDF, execute the following statement:

CREATE OR ALTER FUNCTION dbo.discount_price
(
    @price DECIMAL (12, 2),
    @discount DECIMAL (12, 2)
)
RETURNS DECIMAL (12, 2)
WITH INLINE = ON
AS
BEGIN
    RETURN @price * (1 - @discount);
END

The INLINE clause isn't mandatory. If the INLINE clause isn't specified, it's automatically set to ON/OFF based on whether the UDF can be inlined. If INLINE = ON is specified but the UDF is found ineligible for inlining, an error is thrown.

Remarks

As described in this article, scalar UDF inlining transforms a query with scalar UDFs into a query with an equivalent scalar subquery. Because of this transformation, you might notice some differences in behavior in the following scenarios:

  • Inlining results in a different query hash for the same query text.

  • Certain warnings in statements inside the UDF (such as divide by zero, etc.) that might be previously hidden, can show up due to inlining.

  • Query level join hints might not be valid anymore, as inlining can introduce new joins. Local join hints must be used instead.

  • Views that reference inline scalar UDFs can't be indexed. If you need to create an index on such views, disable inlining for the referenced UDFs.

  • There might be some differences in the behavior of Dynamic data masking with UDF inlining.

    In certain situations (depending upon the logic in the UDF), inlining might be more conservative with respect to masking output columns. In scenarios where the columns referenced in a UDF aren't output columns, they aren't masked.

  • If a UDF references built-in functions such as SCOPE_IDENTITY(), @@ROWCOUNT, or @@ERROR, the value returned by the built-in function changes with inlining. This change in behavior is because inlining changes the scope of statements inside the UDF. Starting with SQL Server 2019 (15.x) CU2, inlining is blocked if the UDF references certain intrinsic functions (for example @@ROWCOUNT).

  • If a variable is assigned with the result of an inlined UDF and it also used as index_column_name in FORCESEEK Query hints, it results in error 8622, indicating that the query processor couldn't produce a query plan because of the hints defined in the query.