Indefinite time to run a code which ran in seconds earlier(using data factory)

Samyak 41 Reputation points
2022-04-25T08:04:53.127+00:00

Hi,

I have been following this tutorial:
https://learn.microsoft.com/en-us/azure/batch/tutorial-run-python-batch-azure-data-factory

I was able to execute this tutorial successfully earlier but recently, I added ~12 files(all greater than atleast 1Gb), since then I have not been able to run the pipeline successfully and it is taking very long to even run the pipeline.

Once i deleted all the newly loaded files and kept only iris.csv and main.py and ran the pipeline again, it is taking infinitely long to run the code which used to run in seconds earlier.
Also, the node shows in "unusable" state. It shows "Message: The VM disk is full. Delete jobs, tasks, or files on the node to free up space and then reboot the node."

PS: I have a "standard_d2s_v3" machine in my batch account.

Does the job run store all the files of a blob in cache?
What is the solution to this? How to know what is the limit of size/number of files that can be loaded to a storage to run efficiently?

Azure Batch
Azure Batch
An Azure service that provides cloud-scale job scheduling and compute management.
320 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,075 questions
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 84,936 Reputation points Microsoft Employee
    2022-04-26T09:32:16.193+00:00

    Hello @Samyak ,

    Thanks for the question and using MS Q&A platform.

    As per the node shows “unusable” state means The Compute Node cannot be used for Task execution due to errors.

    Cause: There is an increase in memory usage and when that happens the node agent will mark that node as unusable.

    Resolution: It is recommended to look at how the task utilizes the disk space. Alternatively, you can choose a larger VM SKU and re-try the same job.

    Some of these files are only written once when pool nodes are created, such as pool application packages or pool start task resource files. Even if only written once when the node is created, if these files are too large they could fill the temporary drive.

    Other files are written out for each task that is run on a node, such as stdout and stderr. If a large number of tasks run on the same node and/or the task files are too large, they could fill the temporary drive.

    To recover an unusable node in VirtualMachineConfiguration pools, you can remove a node from the pool using the remove nodes API. Then, you can grow the pool again to replace the bad node with a fresh one. For CloudServiceConfiguration pools, you can re-image the node via the Batch re-image API. This will clean the entire disk. Re-image is not currently supported for VirtualMachineConfiguration pools.

    For more information, refer to Node errors - Node in unusable state & Node disk full.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

0 additional answers

Sort by: Most helpful