Welcome to Microsoft Q&A platform and thanks for posting your question here.
According to the Azure documentation, querying Delta Lake format in serverless Synapse SQL pool is currently in public preview. This preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. Therefore, it is possible to encounter significant scanning overhead and multiple entries in SQL requests when using the delta format for queries within a serverless SQL pool.
To reduce the data scan, you can follow the best practices for serverless SQL pool provided by Azure. Here are some possible solutions:
- The data types you use in your query affect performance and concurrency. Use the smallest data size that can accommodate the largest possible value. If possible, use varchar and char instead of nvarchar and nchar.
- Use PARSER_VERSION 2.0 to query Delta Lake files: You can use a performance-optimized parser when you query Delta Lake files.
- Creating statistics for columns used in queries can improve query performance in Azure Synapse Analytics. The serverless SQL pool uses statistics to generate optimal query execution plans. While statistics are automatically created for some file types, they are not automatically created for Delta Lake files when using external tables. It's important to manually create statistics for Delta Lake files, especially for columns used in DISTINCT, JOIN, WHERE, ORDER BY, and GROUP BY clauses.
- Optimizing the partition strategy in your data lake can improve query performance in Azure Synapse Analytics. To do this, you need to identify the file types in the data lake and the query performance SLA. You should also consider whether your use case demands predictable performance and cost, and whether you have unplanned or bursty SQL analytical workloads. It's important to identify the data consuming pattern and platforms. By doing so, you can ensure that you have a good partition strategy that meets your specific needs and improves query performance.
- Using Spark pool assessment can help improve query performance in Azure Synapse Analytics. Spark pools enable data engineering and data preparation, and Apache Spark includes many optimization techniques that can improve query performance. By using Spark pool assessment, you can identify performance bottlenecks and optimize your Spark jobs to improve query performance.
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-openrowset#parser_version
I hope this information helps you. Let me know if you have any further questions or concerns.