Copy data from Oracle database to an Azure Data Lake Gen 2 into parquet files

Anonymous
2023-06-28T12:05:07.26+00:00

Hello,

I am trying to copy data from an Oracle Database into parquet files inside an Azure Data Lake Gen 2. Since the target format is Parquet and upon each copy a random name gets assigned to the parquet files, I have not been able to replicate the overwrite behavior.

In order to overwrite the existing files, I am currently using the Delete activity to delete existing parquet files before doing the copy; is there a way I could do this in one step without having to delete the existing files myself?

Thanks in advance for your help & support.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
{count} votes

1 answer

Sort by: Most helpful
  1. AnnuKumari-MSFT 34,566 Reputation points Microsoft Employee Moderator
    2023-06-29T08:18:51.71+00:00

    Hi Moein Torabi ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your question here.

    As I understand your query, you are trying to copy data from oracle database to parquet files in ADLS. However, you want to overwrite the file in each run. Please let me know if that is not the ask.

    Could you please confirm if you are partitioning the data into multiple parquet files ? or in a single target file.

    In case you are storing the oracle table data into a single target file, kindly provide an output filename , say. outputfile.parquet explicitly in the sink dataset. Else, it will auto generate some random file name . Once you assign a filename , everytime the pipeline runs , the file will be overwritten .

    User's image

    Hope it helps. Kindly accept the answer if it's helpful. Thankyou


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.