Delete Query Performance on Azure SQL Database

Swapnil Kulkarni 1 Reputation point
2022-04-27T03:00:55.85+00:00

Hi All,

I have Azure SQL instance with Standard S2: 50 DTUs plan.

I have a table with 90K Rows, no Primary Key, No Foreign Key. Index on Date column.

In Data Factory I have created a Pipeline which will delete data (approx. 30K) from this table. Unfortunately the DELETE query takes almost 10-13 min for the same.

I tried different workarounds

1 Delete condition is on Data Column, so created Index on the same and started Rebuilding Indexes post Data Insertion.

2 Thought Condition on Date column might causing issue so Added Delete_Flag Column. In ADF Pipeline added step to update this Flag column and in next Step delete data based on Flag column value.

3 Create a view of records which needs to be Deleted and delete records from same view.

Any recommendations , inputs for this scenario?

Azure SQL Database
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,535 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ronen Ariely 15,096 Reputation points
    2022-04-27T04:38:21.74+00:00

    Good day @Swapnil Kulkarni and welcome to the QnA forum

    I have a table with 90K Rows, no Primary Key, No Foreign Key. Index on Date column.

    Please provide information to reproduce the scenario instead of stories! This is not a school but technical forum to help (focus on you) to solve an issue. This means that you should do your part and let us focus on the issue

    Please provide:

    1) Queries to CREATE your table(s) including indexes - must for any question regarding performance
    2) Queries to INSERT sample data - must.
    3) The exact query which is executed (use profiler and try to execute the query directly)
    4) The Execution Plan (XML full execution plan and not screenshot) (use profiler and try to execute the query directly)
    5) Which version of SQL Server you are using (this will help to fit the query to your version).

    In Data Factory I have created a Pipeline which will delete data (approx. 30K) from this table. Unfortunately the DELETE query takes almost 10-13 min for the same.

    Delete rows is not a simple action and there are multiple tasks which the server do behind the scenes including (but not only)

    (1) writing to the transaction log all the information about the deleted rows

    You can examine the transaction log using the undocumented comment "fn_dblog" during the procedure of deleting the data. If you do then you can see how the information is written there. You can find that SQL Server log in the transaction log action(s) "LOP_DELETE_ROWS"

    Note! This is the main reason in most cases why DELETE is slow action

    Note! If you want to delete all the rows from a table then you should use TRUNCATE instead of DELETE. This will prevent this issue.

    (2) You might have locks and waits as a result of other queries

    Delete condition is on Data Column, so created Index on the same and started Rebuilding Indexes post Data Insertion.

    Indexes only relevant if you use filter. For example if you delete only part of the rows which fit specific condition.

    In this case the indexes can help the server find the rows which it need to delete. This is basically the same impact for SELECT, which mean that in many cases the best option to choose what index to build for actions like DELETE/UPDATE is by checking the impact on SELECT.

    With that said, creating the index + delete the data will probably take more time then delete the data without index (for most cases). CREATE nonclustered INDEX means that you write data to a new place and sort the data in order to build the index. CREATE clustered INDEX means that you change the table itself and this can have a huge impact since in most cases this means duplicate all the table to a new place with sorting.

    Moreover, the index impact DML action negatively since each time to change the data in the table then the server need to update the index as well.

    Therefore, if you created the index temporarily only for the one-time action then this probably not a good idea. In this case you should also remember to remove the index after the action if it is not needed anymore and in any case it will probably do the opusite for the DELETE unless it dramatically help the server in finding the rows for the deletion

    Thought Condition on Date column might causing issue so Added Delete_Flag Column.

    Adding another column can help in filtering in some cases but at the same time means that you write more data including when you DELETE the row. In order to understand if this can help or no, we need to understand what you are actually doing! For this you need to provide the information I mentioned above so we will be able to reproduce the issue.

    3 Create a view of records which needs to be Deleted and delete records from same view.

    Again, we cannot know if this is relevant, helpful or make it worse without having the information about what you do (NOT stories but information as asked above) unless we will be abel to read minds :-)

    What Next?

    (1) Please provide the missing information as I asked above

    (2) Forget about ADF fir first step and check what happen when you DELETE the rows in the same conditions.

    (2.1) If the issue is with the DELETE action then you should improve performance using the information I wrote above - this is not related to ADF

    (2.2) If the DELETE using ADF is much slower than executing the same task directly then you should use SQL Server Extended Events to monitor what queries exactly were executed by the ADF - and act accordingly or provide the information here for more insights

    (3) Check locks and waits while doing the action. The following document can help you in this task using "sys.dm_tran_locks":

    https://learn.microsoft.com/sql/relational-databases/extended-events/determine-which-queries-are-holding-locks?view=sql-server-ver15&WT.mc_id=DP-MVP-5001699

    You can also use the stored procedure "sp_lock" and "sp_who". A better option for most cases is to use the undocumented "sp_who2" stored procedure in order to examine the behavior of your running queries.

    0 comments No comments