Gen1 to Gen2 ADLS migration Delta Table Partition Files

Md Shahid Akhter 21 Reputation points
2023-11-21T12:22:56.76+00:00

Hello,

I am working on a migration project and I am facing issue while migrating delta tables from Azure ADLS Gen1 to Gen2.

So, as per the Microsoft migration pre-requisites:

File or directory names with only spaces or tabs, ending with a ., containing a :, or with multiple consecutive forward slashes (//) aren't compatible with Gen2. You need to rename these files or directories before you migrate.

Delta tables were created on the partition fields that contained dot (.), so the partition folder path name has dot(.) as well and hence unable to migrate it to ADLS Gen2.

Example : 

/Fruits/Mango/Alfonso/DATE=20200101/LOCATION=UK/STORE=S.K DALTON/part-32434-df7cge3e-4201-4c47-83f1-ef034c34543b.c000.snappy.parquet

Need help on what could be workaround for this so without loosing data, I can migrate to ADLS Gen2. Also ensure history is maintained when doing version check.

Thanks

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,059 questions
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 84,936 Reputation points Microsoft Employee
    2023-11-22T08:50:05.4733333+00:00

    @Md Shahid Akhter - Thanks for the question and using MS Q&A platform.

    It seems that you are facing an issue while migrating delta tables from Azure ADLS Gen1 to Gen2. As per the Microsoft migration pre-requisites, file or directory names with only spaces or tabs, ending with a ., containing a :, or with multiple consecutive forward slashes (//) aren't compatible with Gen2. You need to rename these files or directories before you migrate.

    Regarding your specific issue with partition folder path names containing dots (.), you can try the following workaround:

    1. Create a new partition column in your delta table that does not contain dots (.) in the name.
    2. Use the new partition column to create a new partition folder structure in ADLS Gen2.
    3. Copy the data from the old partition folder structure to the new partition folder structure using a tool like AzCopy or Azure Data Factory.
    4. Once the data is copied, update the delta table to point to the new partition folder structure.
    5. Verify that the data is accessible and the history is maintained by doing a version check.

    I hope this helps. Let me know if you have any further questions.

    0 comments No comments