NathanWilkinson-3815 avatar image
0 Votes"
NathanWilkinson-3815 asked AlbertoMorillo edited

Slow Database Performance/Slow Scaling

We have an Azure SQL database that is currently experiencing performance issues. This issue started sometime between 7:00 am CST Saturday (10/23) morning and 2 am CST Sunday (10/24) morning.

Our Normal Activity/Workload
Our system does 3 sets of "heavy" activities overnight. Each set maxes the database for a few minutes and then drops way off. During normal US business hours (7:00 am -7:00 pm CST M-F) there is usually modest activity (~40% max, avg 20%). Very little activity other times.

On Saturday morning, our heavy overnight activities worked as expected.

The database is about 15 GB in size. It is currently running as an S4 (200 DTU) pricing tier.

Since Sunday Morning
On Sunday morning at 2:00 am, the heavy activities ended up having timeout errors trying to bulk load data into the database. The queries that failed typically take 2-5 seconds to execute. On Sunday morning (and again Monday morning) those queries timed out after 30 seconds.

Our normal user traffic on Monday has also been experiencing timeout problems. The queries seem to be randomly timing out. Sometimes a query will work and then it will timeout. So far, I have not been able to determine a pattern to which queries or when they will or won't work. This is happening with a variety of queries across the system. While the individual query behavior seems random, the overall system is having timeouts consistently.

During this entire time, we rarely go above 15% DTU usage on the database.

Things Tried
We restarted all services that connect to the database. This was an attempt to make sure that if there was a rogue process, it was killed. This did not help.

The query performance items in Azure just tell us what we already know, some queries are timing out.

We also attempted to scale up the database. We mostly did this because we wanted to "restart" the database and there is not a direct way to do it in Azure. Unfortunately, we are trying to scale from an S4 to an S6 but it is taking much longer than usual. Currently, we are almost 4 hours into the operation and we are only at 32% complete. When we've done this in the past, it normally takes less than 10 minutes.

What I'd like to know/help with
1) Why is the scaling of the database taking so long?
2) Why suddenly are we having issues with these queries?

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

AlbertoMorillo avatar image
0 Votes"
AlbertoMorillo answered AlbertoMorillo edited

After the "heavy" activity update statistics and defragment indexes.

Make sure there are not missing indexes, even if you have even set "Automatic tuning" on the Azure Portal. Try the query shared on this article.

Your index strategy has to be good and the query need to be well tuned with Azure SQL Database or you will see high DTU usage, throttling and timeouts occurring.

When databases are busy scaling up/down may take long time, Queries are canceled when a scale up/down is initiated. Transactional queries, which modify your data or the structure of the data, may not be able to stop quickly. Transactional queries, by definition, must either complete in their entirety or rollback their changes. Rolling back the work completed by a transactional query can take as long, or even longer, than the original change the query was applying. For example, if you cancel a query which was deleting rows and has already been running for an hour, it could take the system an hour to insert back the rows which were deleted. If you run scaling while transactions are in flight, your scaling may seem to take a long time because scaling has to wait for the rollback to complete before it can proceed.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.