Querying Delta Lake from Synapse SQL Serverless Pool

Byers, Luke 1 Reputation point
2021-06-03T13:02:11.737+00:00

In this documentation, https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-delta-lake-format#quickstart-example, it is mentioned one may query an existing Delta Lake from a SQL Serverless Pool by executing a query along the lines of:

select top 10 *  
from openrowset(  
    bulk 'https://sqlondemandstorage.blob.core.windows.net/delta-lake/covid/',  
    format = 'delta') as rows  

This query would work fine, but what if we wanted to hide away the bulk definition location behind a View, so something along the lines of:

CREATE VIEW [dbo].[covid]  
as select *  
 from OPENROWSET(  
     bulk 'https://sqlondemandstorage.blob.core.windows.net/delta-lake/covid/',  
    format = 'delta'  
) as rows  
GO  

And then query that view:

select top 10 * from dbo.covid  

The problem I'm seeing is when executing the query against the delta location directly, everything works fine and executes, in my case, in <= 1 second, but querying the View can take anywhere from 10 seconds up to around a minute. Is there a reason for such a difference in performance when simply hiding away the bulk location behind a View definition?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,004 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Byers, Luke 1 Reputation point
    2021-06-04T17:24:21.237+00:00

    Interestingly enough, I re-created my delta table from my Spark pool and re-created my SQL Serverless View and the performance is more in-line with the non-view way. I'm not sure if there was something wrong with the data I uploaded the first time, but after starting from scratch it seems to be much more consistent. I'll keep my eye out for the random poor performance to see if it creeps in again.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.