How to fix "out of memory exception " while processing a pipeline of around 43 gb of data using copy activity?

Question

How to fix "out of memory exception " while processing a pipeline of around 43 gb of data using copy activity?

Lakshmi Moulya Nerella 0

I am processing a pipeline of around 43gb of data using copy activity and i am getting the error as :

ErrorCode=SystemErrorOutOfMemory,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A task failed with out of memory.,Source=,''Type=System.OutOfMemoryException,Message=Array dimensions exceeded supported range.,Source=System.Core,'

Failure Type is "SystemError"

If there is any workaround for this it will be helpful

Lakshmi Moulya Nerella 0 Reputation points

2024-06-25T06:38:43.4+00:00

Hi @Bhargava-MSFT

I am not using SHIR. Is there any workaround by using auto resolve integration runtime?
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2024-06-26T20:06:53.4933333+00:00

Can you please choose a bigger core count?
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2024-07-01T17:57:08.94+00:00

I am checking to see if you have any further questions here.
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2024-07-05T15:59:58.39+00:00

Hello Lakshmi Moulya Nerella,

Please let me know if you have any further questiosn here.

2 answers

Your answer

Lakshmi Moulya Nerella 0 Reputation points

2024-06-25T06:38:43.4+00:00

Hi @Bhargava-MSFT

I am not using SHIR. Is there any workaround by using auto resolve integration runtime?
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2024-06-26T20:06:53.4933333+00:00

Can you please choose a bigger core count?
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2024-07-01T17:57:08.94+00:00

I am checking to see if you have any further questions here.
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2024-07-05T15:59:58.39+00:00

Hello Lakshmi Moulya Nerella,

Please let me know if you have any further questiosn here.

Answer 1

When processing large datasets like 43 GB using a copy activity, encountering an "out of memory" exception is a common issue. This typically occurs because the system is attempting to load too much data into memory simultaneously. Here are several strategies to mitigate this issue:

Increase Resource Allocation:

Ensure that the system running the pipeline has sufficient memory and CPU resources. Sometimes, simply increasing the available resources can solve the problem.

Use Parallelism:

  - Enable parallel copy in your activity settings. This splits the data transfer into multiple threads, which can help in managing memory usage better. Adjust the degree of parallelism to find an optimal setting that your system can handle.
  
  **Use Staging:**
  
     - If you're copying data between different services (e.g., from on-premises to cloud), consider using staging options such as Azure Blob Storage as an intermediate step. This reduces memory load by breaking the process into smaller, more manageable steps.
     
     **Batch Processing:**
     
        - Break down the data into smaller chunks or batches. Process each batch separately to avoid loading the entire dataset into memory at once.
        
        **Data Compression:**
        
           - If the data is not already compressed, consider compressing it before the transfer. This reduces the amount of data that needs to be handled at any given time.
           
           **Optimize Source and Sink Configuration:**
           
              - Ensure that the configurations for your source and sink are optimized for large data transfers. This includes setting appropriate timeouts, increasing buffer sizes, and using efficient data formats.
              
              **Monitoring and Scaling:**
              
                 - Continuously monitor the memory usage during the pipeline execution. Based on the observations, scale your resources up or down as needed.

Error Handling and Retries:

Implement robust error handling and retry logic. Sometimes transient errors can cause memory issues, and having a strategy to retry can help in completing the transfer.

Lakshmi Moulya Nerella 0 Reputation points

2024-06-24T10:55:16.39+00:00

Could you please be a bit more clear regarding parallelism in the json format?

Answer 2

Bhargava-MSFT 31,261 Microsoft Employee Moderator

Hello Lakshmi Moulya Nerella,

Are you using SHIR here?

Here are few suggestions:

Register and online self-hosted IR with powerful machine (high CPU/Memory) to read data from the big file through copy activity.
Use memory optimized + big size (for example, 48 cores...) cluster to read data from the big file through dataflow activity.

User's image

3. Split big file into small ones, then use copy or dataflow activity to read the folder.

Share via

How to fix "out of memory exception " while processing a pipeline of around 43 gb of data using copy activity?

2 answers

Your answer