Synapse Pyspark: Load Parquet files created by CREATE EXTERNAL TABLE CETAS command

Pierre Gourven 41 Reputation points
2022-10-14T17:51:20.613+00:00

Hi everybody,

One of my colleagues created an external table on the SQL Serverlesspool using a CETAS order.
The parquet files are located in subfolders created thanks the LOCATION command.

CREATE EXTERNAL TABLE dbo.TEST
WITH (
LOCATION = 'Test/2020/01/01',
DATA_SOURCE = SOURCE,
FILE_FORMAT = PARQUET
)
then other files are created in different date folder YYYY/MM/DD

I would like to read the parquet file for a specific date using Pyspark.
with something like parDF=spark.read.parquet("/file_path/YYYY/MM/DD/156224SSQKDQHKDH.parquet")

The files are located on an ADLSv2 account.

Is there a way to achieve this?

Thanks for your help

Pete

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,473 questions
{count} votes

Accepted answer
  1. HimanshuSinha-msft 19,381 Reputation points Microsoft Employee
    2022-10-17T22:24:35.99+00:00

    Hello anonymous user,
    Thanks for the question and using MS Q&A platform.

    As we understand the ask here is you are trying to read a paraquet file , please do let us know if its not accurate.

    You can do that by adding the storage account as a linked service in the Synapse Studio .

    251292-image.png

    Once done navigate to the paraquet file and select "Load to dataframe"

    251301-image.png

    This will create a script like

    %%pyspark
    df = spark.read.load('abfss://himanshu@Piepel .dfs.core.windows.net/NYCTaxi/PassengerCountStats.parquet/part-00000-21161a2b-1c65-4a76-9999-0b2403785f46-c000.snappy.parquet', format='parquet')
    display(df.limit(10))

    251311-image.png

    Please do let me if you have any queries.
    Thanks
    Himanshu


    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
      • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    0 comments No comments

0 additional answers

Sort by: Most helpful