Zipping and UnZipping files for Batch limitation in ADF

Sampat, Varun 231 Reputation points
2021-01-12T15:09:45.593+00:00

Hello,

I recently encountered the following error while running a Custom Activity on Azure Data Factory:

  • "Total size of resourceFiles cannot be more than 32768 characters"

For this error, I found many blog posts that state you must zip all the files, then unzip using the command for the Custom Activity. I came across the following posts:

  1. https://social.msdn.microsoft.com/Forums/en-US/0a191641-1e77-4eae-b33d-9a0a331628b5/advv2-custom-activity-with-many-dependencies-total-size-of-resourcefiles-cannot-be-more-than?forum=AzureDataFactory
  2. https://social.msdn.microsoft.com/Forums/en-US/ab57e810-94d7-4a48-a358-649f607c9717/azure-batch-resourcefiles-limitation?forum=azurebatch
  3. https://stackoverflow.com/questions/55995566/how-do-i-unzip-and-execute-a-batch-service-job-as-part-of-azure-data-factory

But I got a little confused and was hoping to get some clarification on the following questions:

  1. How do I run a Python script with some command line arguments (For example, my current command is "python main.py -o test123") along with unzipping files?
  2. When I unzip files on the Batch Node, should I have to delete the files once the Custom Activity is done running? How is that taken care of?
  3. Does any activity on ADF 'zip' files for you?

If there's existing documentation for this, please guide me toward it. If not, any help will be appreciated!

Thank you, in advance!

Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
2,715 questions
Azure Batch
Azure Batch
An Azure service that provides cloud-scale job scheduling and compute management.
303 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,603 questions
{count} votes

Accepted answer
  1. MartinJaffer-MSFT 26,031 Reputation points
    2021-01-12T20:57:21.273+00:00

    Hello @Sampat, Varun and welcome to Microsoft Q&A. Thank you for your excellent question.

    1. according to the stackoverflow you referenced, it would look something like cmd /c "Unzip.exe myZippedStuff && python mainPythonScript argument1 argument2 argument3"
    2. Whether the node persists or not is determined by how you have configured the pool. You can have the instance get deleted once the task completes.
    3. A copy activity using 2 Binary datasets can be used to compress / decompress files. To do this, you need to indicate the compression type accordingly. On the unzipped side, it should be None. When Unzipping, make sure to leave the File part of the File path empty, otherwise not everything will be extracted. The contents will be written to a folder with the same name as the zipped file. Specifying a Directory will put it in a sub-folder.
      55852-image.png
      55861-image.png
      55799-image.png
    2 people found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful