Entity Framework / Slow performance eager loading large table

Brandon Poindexter 71 Reputation points
2021-09-16T19:31:02.607+00:00

Some background. I have an app that contacts a vendor's API and pulls in data, in this case logs of telephone activity. When querying the vendor's API, I receive a list of phone call logs that my app stores in a database.

I have code to avoid duplicate entries in my database tables. Initially how this was built, is that when I received a phone call record to process from the vendor API, I would use an EF query using my database context to see if the record already existed. If the record existed, I'd skip importing and move to the next record. If the record did exist, I'd add it to a list of records to be written to the database. I write records in large batches, since it proved to be far more efficient than writing them one at a time (obviously).

The portion where I query the database through the context for every record I receive from the vendor API, one at a time, was very inefficient. I optimized this by eager loading my entire table of phone call records, then querying the data locally in memory. Querying records loaded into memory is far faster than throwing a query at the DB for each record I get from the vendor's API.

Obviously, eager loading an entire table with a rather large number of records is not going to be the fastest query, but overall it's not necessary for it to be particularly fast. This app runs once in a 24 hour period outside normal business hours. Typically, even with the table being the size it is, my app, running locally on the same server as the database, can load the entire table in about 2 seconds. This is acceptable because that load only needs to happen once per day when my app runs.

However, I am having issues in my testing environment. When my app runs this same query locally on my development workstation, and contacts the database server over a network connection, the time it takes for the query to load my phone records table goes from 2 seconds to 5 minutes.

I have verified the exact same behavior occurring in SSMS by doing a simple select * for the table in question. The select * query, running in SSMS locally on the database server, is 2 to 3 seconds to load the entire table. The same query is around 5 minutes when run in SSMS on my workstation.

Clearly there is a problem here and while it's not a problem that impedes production, it is impeding my testing environment and I'd like to deal with it. Before getting further into this, understand that going to lazy loading or querying for each record I need individually is not on the table. That is already verified as being far worse performance than what I already have, regardless of what the "best practice" is for using EF database connections.

My thought is that the database server is, for whatever reason, feeding SSMS and my app running on my dev workstation, the results one record at a time, and network overhead (not bandwidth) is creating delays. I want to work around this. What I'd like to do is find out if there is a way to get the database server to load the entire table locally on the server first, then deliver that result in a single operation to my app running on my workstation. Is there a way to cause the server to behave in this way, or am I missing something else as far as performance tuning?

Entity Framework Core
Entity Framework Core
A lightweight, extensible, open-source, and cross-platform version of the Entity Framework data access technology.
726 questions
SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
13,361 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Erland Sommarskog 107.1K Reputation points
    2021-09-16T22:01:45.923+00:00

    Do you have "Multiple Active Results Sets=true" in the connection string? That would explain why things are slow. But it would also be that your network connection is slow...

    Anyway, a better approach would be to run one single query that takes the entire list of phone numbers checks it for duplicates. This means that you will need to figure out how to send that list to the database in a single query.

    The preferred method is to use a table-valued parameter, but I don't think Entity Framework supports this. But maybe EF Core? Didn't I hear something about that? (I'm an SQL Server guy, and I'm not working with ORMs.)

    But you can always send a comma-separated list or an XML document with the forums and shred that in the database.

    0 comments No comments

  2. YufeiShao-msft 7,091 Reputation points
    2021-09-17T07:49:25.447+00:00

    Hi @Brandon Poindexter ,

    If you are using loading of entities, then EF requires Multiple Active Result Sets to be enabled via the connection string, you can test this is to turn off Lazy Loading and either remove MultipleActiveResultSets=True; (the default is "false") or at least change it to be MultipleActiveResultSets=False;.

    One possible explanation of this behavior is Parameter sniffing, EF generates a query that has few parameters. The first time you run this query the server creates an execution plan for this query using values of parameters that were in effect in the first run, but, later on you run the same EF query using other values for parameters. The server keeps using the previous plan, so The query becomes slow.
    The best plan generated by parameter sniffing will change depending on the parameter values passed. When you run a query in SSMS, if it fails to reuse the cache plan being used by the application, you will get a new plan that is appropriate for the parameter values in the test.

    0 comments No comments