How to handle dynamic structure of files getting read in azure synapse serverless

Question

How to handle dynamic structure of files getting read in azure synapse serverless

Priya Jha 896

Hi All,

I have a requirement wherein we are using azure synapse serverless tables to read appending csv files present in azure data lake.

The schema of the files is not constant and columns can be added or deleted from the newly created files.

How can we handle this scenario wherein we would be able to read old as well new files present in data lake in azure synapse serverless tables?

AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2023-05-17T17:12:47.1566667+00:00

Hi Priya Jha ,

Just following up to see if the below answer helped. Please do consider clicking Accept Answer as accepted answers help community as well. Also, please click on Yes for the survey 'Was the answer helpful'

Accepted answer

1 additional answer

Your answer

AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2023-05-17T17:12:47.1566667+00:00

Hi Priya Jha ,

Just following up to see if the below answer helped. Please do consider clicking Accept Answer as accepted answers help community as well. Also, please click on Yes for the survey 'Was the answer helpful'

Answer 1

Hey Priya!
To handle the dynamic structure of files in Azure Synapse serverless, you can follow these steps:

External Table: Create an external table in Azure Synapse serverless SQL pool that references the CSV files stored in Azure Data Lake. This external table acts as a logical representation of the data in the files without physically importing them into the SQL pool.
Schema Drift: Since the schema of the files can change with new columns being added or existing columns being removed, you need to handle the schema drift in your external table. By default, the external table in Azure Synapse assumes a fixed schema. However, you can use the OPENROWSET function with the WITH (FORMAT='CSV') option to handle schema drift. This allows the external table to read files with varying column structures.
Column Projection: While querying the external table, you can use column projection to filter out columns that may not exist in all the files. This helps to handle scenarios where certain columns are present in some files but not in others.
Flexible Querying: As the schema can change, it's essential to have flexible querying in place. You can use dynamic SQL or dynamic scripting techniques to generate and execute queries based on the available columns in each file. This allows you to dynamically adapt to the varying schema of the files.

ETL Processes: If you require a more structured approach and need to transform the data before loading it into Azure Synapse, you can implement ETL (Extract, Transform, Load) processes using tools like Azure Data Factory or Azure Databricks. These tools provide more flexibility to handle schema drift and perform transformations on the data before loading it into the serverless SQL pool.

By following these steps, you can handle the dynamic structure of files in Azure Synapse serverless and read both old and new files present in Azure Data Lake. Remember to plan for schema drift, use flexible querying techniques, and consider implementing ETL processes if needed.

AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2023-05-16T13:59:07.28+00:00

Hi Priya Jha ,

Just checking in to see if the above answer helped. Please do consider clicking Accept Answer as accepted answers help community as well. Also, please click on Yes for the survey 'Was the answer helpful'

Answer 2

Nandan Hegde 36,146 MVP Volunteer Moderator

Hey
in addition to what Vedant has described, below blog explains it in detail w.r.t all scenarios :

https://www.serverlesssql.com/schema-versioning-in-serverless-sql-pools/

Share via

How to handle dynamic structure of files getting read in azure synapse serverless

1 additional answer

Your answer