Hi,
I have been following this tutorial:
https://learn.microsoft.com/en-us/azure/batch/tutorial-run-python-batch-azure-data-factory
I was able to execute this tutorial successfully earlier but recently, I added ~12 files(all greater than atleast 1Gb), since then I have not been able to run the pipeline successfully and it is taking very long to even run the pipeline.
i had also received an error: "FileIOException: FileIOException('Failed to allocate 635101071 bytes for file D:\batch\tasks\workitems\adfv2-Analytics_pool\job-1\9915a460-5efe-445d-9684-8d72b57b83a3\wd\Highmark_p1.parquet.gzip, out of disk space')"
The pool shows "unusable" state.
PS: I have a "standard_d2s_v3" machine in my batch account.
Does data factory load all files in a blob storage folder even when the python script is targeted for a single file?
Asking this because, earlier when i had only "iris.csv" and "main.py" in my input folder, the job succeeded in mere seconds but now when i have added multiple heavy files, it's taking very long. And once i deleted all the files again and kept only the 2 files, it is again taking infinite time.
What is the solution to this? How to know what is the limit of size/number of files that can be loaded to a storage to run efficiently?