Data ingestion using copy data activity (from Oracle to ADLS GEN2)

Anandhakumar Cholendran 45 Reputation points
2023-07-28T07:47:47.67+00:00

Hello ,

I am using a query to ingest the data from Oracle to ADLS GEN2 (as a parquet file) using copy activity. The Query which I am using has a row count of 5.2B records. When I use the copy activity to run the script , it is running for more than 10+hrs to execute the query in oracle and no data is getting ingested to sink (ADLS GEN2) Can you please suggest what approach needs to be followed here to handle the data with such scenarios!!

Thanks in advance :)

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,559 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
{count} votes

Accepted answer
  1. KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator
    2023-07-28T22:19:31.08+00:00

    @Anandhakumar Cholendran Welcome to Microsoft Q&A forum and thanks for reaching out here.

    Seems like you are trying to copy large amount of data which is why it is taking more time. I'm not sure what your source (Oracle) connector configuration is but ADF oracle connector provides built-in data partitioning to copy data from Oracle in parallel. This is recommended to efficiently copy large amount of data from Oracle as it helps to run parallel queries against your Oracle source to load data by partitions. You can find those data partitioning options on the Source tab of the copy activity as shown below.

    Screenshot of partition options

    The following are suggested configurations for different scenarios. When copying data into file-based data store, it's recommended to write to a folder as multiple files (only specify folder name), in which case the performance is better than writing to a single file.

    User's image

    When copying data from a non-partitioned table, you can use "Dynamic range" partition option to partition against an integer column. If your source data doesn't have such type of column, you can leverage ORA_HASH function in source query to generate a column and use it as partition column.

    In addition, I suggest going through the copy activity performance optimization guide to improve the performance of you copy activity: Copy activity performance optimization features

    User's image

    If you feel Copy activity is slow with respect to performance, I encourage you to explore Azure Data factory mapping data flow which are executed as activities within Azure Data Factory pipelines that use scaled-out Apache Spark clusters. Since they run on Apache spark clusters, they are very performant when compared to regular pipeline activities.

    Using mapping data flow you can configure data partition while reading and writing the data to your sink store.

    User's image

    Hope this info helps. Do let me know if you have any questions.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.