Create External Table to DataLake2 partitioned parquet files. Get partiton as column. In dedicated sql pool.

Ian Wright 11 Reputation points
2021-11-30T14:49:00.947+00:00

I have a data lake where most of tables are partitioned e.g.

/tables/dimension1/column_key=5000/.parquet
/tables/dimension1/column_key=5001/
.parquet
/tables/dimension1/column_key=5003/.parquet
/tables/dimension1/column_key=5004/
.parquet

and

/tables/facttable1/yeardate_key=202101/.parquet
/tables/facttable1/yeardate_key=202102/
.parquet
/tables/facttable1/yeardate_key=202103/.parquet
/tables/facttable1/yeardate_key=202104/
.parquet

In my sql dedicated pool I want to create external tables looking at this data, for example:

CREATE EXTERNAL TABLE [mySchemaName].[Dimension1]
(
[Dimension_skey] bigint
, [Dimension_text] varchar(8000)
)
WITH
(
DATA_SOURCE = [myDataSource]
, LOCATION = N'/tables/dimension1/'
, FILE_FORMAT = [myParquetFormat]
)
GO

but how do I get the lake partition into this table as a column "column_key"?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,136 questions
{count} vote

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA 90,506 Reputation points
    2021-12-02T08:28:03.087+00:00

    Hello @Ian Wright ,

    Thanks for the question and using MS Q&A platform.

    Azure Synapse Dedicated pool does not support partitioned external tables/ OPENROWSET.

    Are there any alternative solutions?

    Unfortunately, there is no way to expose partition column and leverage partition elimination. Only serverless SQL pool supports partitioning and partition elimination with OPENROWSET.

    For Azure Synapse Serverless pool:

    If you have a set of files that is partitioned in the hierarchical folder structure, you can describe the partition pattern using the wildcards in the file path. Use the FILEPATH function to expose parts of the folder path as partitioning columns.

    The partitioned views will perform folder partition elimination if you query this view with the filters on the partitioning columns. This might improve performance of your queries.

    CREATE VIEW TaxiView  
    AS SELECT *, nyc.filepath(1) AS [year], nyc.filepath(2) AS [month]  
    FROM  
        OPENROWSET(  
            BULK 'parquet/taxi/year=*/month=*/*.parquet',  
            DATA_SOURCE = 'sqlondemanddemo',  
            FORMAT='PARQUET'  
        ) AS nyc  
    

    For more details, refer to Create and use views using serverless SQL pool in Azure Synapse Analytics and Use file metadata in serverless SQL pool queries

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.