Copy LOB based data from DB2 to Azure Data Lake Gen2 storage using ADF

Debajit Mitra 0 Reputation points
2024-04-21T03:40:26.3+00:00

Hi team,

I was trying to work on a scenario to copy data from a table in db2 database having a column with datatype as LOB(CLOB) to Azure Data Lake Gen2 storage container using Azure Data Factory. However, its taking a considerable amount of time( a day) to copy the data whenever this LOB datatype column is ingested. Whenever we exclude that column in copy activity query option, the data gets copied within few mins.

Could you please provide a best practice approach to copy the table data along with LOB based column from db2 to Azure storage using Data factory pipelines copy activity so that throughput gets increased.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,348 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,599 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Amira Bedhiafi 15,676 Reputation points
    2024-04-21T18:58:24.9533333+00:00

    First, I invite you to troubleshoot your Copy activity, to better understand the behaviour you have.

    Then, here are my recommendations :

    • You can increase the number of DIUs assigned to your copy activity in ADF. More DIUs can lead to faster data transfer, although costs may increase.
    • Of course, if there’s a logical way to partition the source data by date, ID range, or other criteria, you may need to enable parallel copying.
    • If not all data is needed, try to filter out unnecessary rows at the source level using a query.
    • Convert the data into a columnar format like Parquet, which is not only efficient in storage but also optimized for performance in data lakes.

    Try and tell us :D


  2. phemanth 5,840 Reputation points Microsoft Vendor
    2024-04-23T08:43:11.41+00:00

    @Debajit Mitra

    Thanks for using MS Q&A platform and posting your query.

    I understand your concern. The issue seems to be with the handling of the TEXT data type during the copy activity in Azure Data Factory (ADF). Here are some suggestions that might help:

    1. Explicit Data Type Mapping: In ADF’s Copy Data activity, you can specify explicit schema and data type mapping. This allows you to customize the column/field mapping from source to sink based on your need. You might want to try explicitly mapping the TEXT data type to a different compatible data type that can handle the data without causing performance issues.
    2. Type Conversion Settings: There are type conversion settings available in ADF’s Copy activity. However, these settings are not supported for all copy methods. You might want to explore using a different copy method that supports type conversion settings, such as Bulk insert.
    3. Handling Special Characters: If the TEXT data contains special characters, it could cause issues during the copy process3. You might want to check if this is the case and consider encoding the data as UTF-83.
    4. Using QuoteAllText Setting: When writing to a CSV file using the DelimitedText format in ADF’s Copy Data activity, how values are written depends on the quoteAllText setting4. If this setting is enabled, all values, regardless of data type, will be quoted in the CSV. You might want to experiment with this setting to see if it affects the performance of the copy activity.

    Hope this helps. Do let us know if you any further queries.

    0 comments No comments