Querying Delta Lake from Synapse SQL Serverless Pool

Question

Querying Delta Lake from Synapse SQL Serverless Pool

Byers, Luke 1

In this documentation, https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-delta-lake-format#quickstart-example, it is mentioned one may query an existing Delta Lake from a SQL Serverless Pool by executing a query along the lines of:

select top 10 *  
from openrowset(  
    bulk 'https://sqlondemandstorage.blob.core.windows.net/delta-lake/covid/',  
    format = 'delta') as rows

This query would work fine, but what if we wanted to hide away the bulk definition location behind a View, so something along the lines of:

CREATE VIEW [dbo].[covid]  
as select *  
 from OPENROWSET(  
     bulk 'https://sqlondemandstorage.blob.core.windows.net/delta-lake/covid/',  
    format = 'delta'  
) as rows  
GO

And then query that view:

select top 10 * from dbo.covid

The problem I'm seeing is when executing the query against the delta location directly, everything works fine and executes, in my case, in <= 1 second, but querying the View can take anywhere from 10 seconds up to around a minute. Is there a reason for such a difference in performance when simply hiding away the bulk location behind a View definition?

Samara Soucy - MSFT 5,141 Reputation points

2021-06-04T17:17:53.43+00:00

Byers,

I'm working on replicating your issue, and I'm going to reach out to the product team to see if they can provide any insight. I'll update you with what I find or if they need any additional info.

1 answer

Your answer

Samara Soucy - MSFT 5,141 Reputation points

2021-06-04T17:17:53.43+00:00

Byers,

I'm working on replicating your issue, and I'm going to reach out to the product team to see if they can provide any insight. I'll update you with what I find or if they need any additional info.

Answer 1

Byers, Luke 1

Interestingly enough, I re-created my delta table from my Spark pool and re-created my SQL Serverless View and the performance is more in-line with the non-view way. I'm not sure if there was something wrong with the data I uploaded the first time, but after starting from scratch it seems to be much more consistent. I'll keep my eye out for the random poor performance to see if it creeps in again.

Samara Soucy - MSFT 5,141 Reputation points

2021-06-04T18:53:14.247+00:00

I'm glad it's working now- if it returns feel free to @ me or send an email to my attention to AzCommunity@microsoft.com, just make sure to include a link to this thread in the email so I know where it is coming from. For things that aren't consistent like this, the query IDs are the most useful bit of information for the product team to try to discover what happened.

Share via

Querying Delta Lake from Synapse SQL Serverless Pool

1 answer

Your answer