Serverless query went from taking 19 minutes to 6+ hours

John Pepper 1 Reputation point
2022-07-05T07:57:40.98+00:00

I have a weird issue where a simple query that was taking 19 minutes is now taking over 6 hours to execute, so long in fact that it never successfully completes. It was running successfully until about a week ago. There were not changes to the underlying data and the query is identical. I also haven't made any changes to settings in the workspace or on the blob storage.

The underlying data is 21 TB of parquet files stored in the same region in blog storage. Each file is no larger than 500mb.

Query structure is similar to below.

SELECT
*
FROM
OPENROWSET(
BULK 'mypath/',
FORMAT = 'PARQUET'
) AS [result]
Where id in
(
SELECT
distinct id
FROM
OPENROWSET(
BULK 'mypath/
',
FORMAT = 'PARQUET'
) AS [result]
where number between 38 and 40 and number2 between -104 and -105)

I have tried running the sub query separately (never completes) and switching the where condition to a text based column which also never completes. I'm pulling my hair out at this point because it had been running successfully for several weeks. Even a single folder which is only 700gb no longer completes.

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
3,199 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,378 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA 90,646 Reputation points Moderator
    2022-07-06T05:44:30.467+00:00

    Hello @John Pepper ,

    Thanks for the question and using MS Q&A platform.

    If you have queries with a query duration longer than 30 minutes, the query slowly returning results to the client are slow. Serverless SQL pool has a 30-minute limit for execution. Any more time is spent on result streaming. Try the following workarounds:

    • If you use Synapse Studio, try to reproduce the issues with some other application like SQL Server Management Studio or Azure Data Studio.
    • If your query is slow when executed by using SQL Server Management Studio, Azure Data Studio, Power BI, or some other application, check networking issues and best practices.
    • Put the query in the CETAS command and measure the query duration. The CETAS command stores the results to Azure Data Lake Storage and doesn't depend on the client connection. If the CETAS command finishes faster than the original query, check the network bandwidth between the client and serverless SQL pool.

    For more details, refer to Azure Synapse Serverless SQL Pool - Query duration is very long.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.