Data Factory taking suspiciously long time to run a simple python code

Samyak 41 Reputation points
2022-04-25T07:58:12.003+00:00

Hi,

I have been following this tutorial:
https://learn.microsoft.com/en-us/azure/batch/tutorial-run-python-batch-azure-data-factory

I was able to execute this tutorial successfully earlier but recently, I added ~12 files(all greater than atleast 1Gb), since then I have not been able to run the pipeline successfully and it is taking very long to even run the pipeline.
i had also received an error: "FileIOException: FileIOException('Failed to allocate 635101071 bytes for file D:\batch\tasks\workitems\adfv2-Analytics_pool\job-1\9915a460-5efe-445d-9684-8d72b57b83a3\wd\Highmark_p1.parquet.gzip, out of disk space')"
The pool shows "unusable" state.

PS: I have a "standard_d2s_v3" machine in my batch account.

Does data factory load all files in a blob storage folder even when the python script is targeted for a single file?
Asking this because, earlier when i had only "iris.csv" and "main.py" in my input folder, the job succeeded in mere seconds but now when i have added multiple heavy files, it's taking very long. And once i deleted all the files again and kept only the 2 files, it is again taking infinite time.

What is the solution to this? How to know what is the limit of size/number of files that can be loaded to a storage to run efficiently?

Azure Batch
Azure Batch
An Azure service that provides cloud-scale job scheduling and compute management.
374 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,603 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 38,546 Reputation points Microsoft Employee Moderator
    2022-04-26T10:23:51.19+00:00

    Hi @Samyak ,

    Thank you for posting query in Microsoft Q&A Platform.

    Is your Python code not handling each file process at one time? You can have a logic of Python in a such a way that to check if file loaded to blob completely or not. If yes, then only process another file.

    Between, Please check here to possible reasons for node is unusable state and see if this helps.

    Please let us know how it goes. Thank you.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.