SQL Server - Optimizing selects of large amounts of data

PT 0 Reputation points
2023-04-19T17:41:55.81+00:00

I have a general question about retrieving large amounts of data. We are importing a file that has to insert a very large number of new child records. The parent records already exist and are unique to each child record. The parent records are found through multiple joins based on identifiers that are in the file. Both the parent table and child table have millions of records. I first designed the system to retrieve the parent records in bulk by using IN operators. At the time, the number of items in the IN statement was no more than 80. The total number of records found could be in the hundreds. I found that this gave us a 10x performance boost on average. The main drawback seems to be that the query plan can't be reused if the the number of items in the IN statement changes, even though the rest of the query is the same. As the system has grown, this may be more of an issue. I now need to import files that could easily number in the tens of thousands of new records. I think that my current optimization is going to be problematic. Going back to retrieving each record individually seems worse. I could break up the queries into a multiple statements while still using the IN operator instead of having one big query. Is there another option I am overlooking? I thought about using a stored procedure to build a temp table, but I don't know how I would pass such a large number of identifiers for it to pull the data. This would at least cut down on the network traffic of executing each individual query thousands of times.

SQL Server Other
{count} votes

2 answers

Sort by: Most helpful
  1. Bjoern Peters 8,921 Reputation points
    2023-04-19T21:08:23.9333333+00:00

    Hi PT Welcome to Q&A Forum; this is a great place to get support, answers, and tips. Thank you for posting your question; I'll be more than glad to help you out.

    To adequately assist you with your issue/problem, it would be constructive for us to reproduce your scenario.

    Please provide us with at least the following things:

    1. DDL and sample data population, i.e., CREATE table(s) plus INSERT, T-SQL statements.
    2. What you need to do, i.e., logic and your attempt implementation of it in T-SQL.
    3. Desired output based on the sample data in #1 above.
    4. Your SQL Server version (SELECT @@version;)

    You should provide us with at least those two relevant tables and 1-3 demo parents with related demo childs (or actual data if there is no sensitive data in it) I hope my answer is helpful to you,

    Your Bjoern Peters

    If the reply was helpful, please upvote and/or accept it as an answer, as this helps others in the community with similar questions. Thanks!


  2. PT 0 Reputation points
    2023-04-21T18:23:00.0833333+00:00

    I was able to throw something together that should illustrate my question. In the query, the number of items in the two IN statements is highly variable. I think this causes an issue with the cache. For my new project, the SKU equivalent will grow up to 113 items and the customer equivalent will grow up to 80. That's up to 9040 individual selects if I do it one at a time. I looked into passing in a complex set of data to a stored procedure. What I saw was either 1) using a huge string and parsing it, 2) using a table, or 3) using JSON/XML. I could use a store procedure to populate a temp table one record at a time and then pass back the entire record set. The options don't seem to scale well based on the comments I saw. I don't know if it would be worse than what I am doing now. create_bulk_example_db.sql.txt bulk_example_table_data.sql.txt bulk_example_query.sql.txt

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.