Azure Databricks - How to create external partitioned table using parquet format from blob storage

ICHAN 0 Reputation points
2023-02-14T21:13:27.9566667+00:00

I need to copy some partitioned tables from on prem HIVE DB. I have copied the underneath parquet files to Azure blob storage, this is the folder structure:

e.g.

in blob storage:

TABLE_1/PART=1/*.parq ==> contains multiple parq files

TABLE_1/PART=2/*.parq

TABLE_1/PART=3/*.parq

I was able to create the external table on this location with this syntax:

CREATE TABLE IF NOT EXISTS table_1

USING PARQUET

OPTIONS (path '/somelocation/TABLE_1/')

but the table is not returning any results.

Is there a way to specify the partitions when creating the table?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,514 questions
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA 90,641 Reputation points Moderator
    2023-02-15T10:51:02.9133333+00:00

    Hello @ICHAN

    Thanks for the question and using MS Q&A platform.

    You can create an external partitioned table using the parquet format from blob storage in Azure Databricks by using the following steps:

    1. Access the blob storage container to Databricks file system (DBFS).
    2. Create a table in Databricks using the parquet format and pointing to the mounted blob storage container.

    As per the repro, I used the simple method i.e. Access Azure Blob storage using the DataFrame API and then create a table using the parequet format as shown below:
    User's image

    Here is an example of how to create an external partitioned table using the parquet format from blob storage in Databricks:

    CREATE TABLE IF NOT EXISTS table_1 (
    <column_1_name> <column_1_data_type>,
    <column_2_name> <column_2_data_type>,
     ...
    )
    USING PARQUET
    PARTITIONED BY (part string)
    OPTIONS (path '/somelocation/TABLE_1/')
    

    Hope this helps. Do let us know if you any further queries.


    Please don’t forget to Accept Answer wherever the information provided helps you, this can be beneficial to other community members.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.