Uploading Multiple Folders for Remote Job via Azure Machine Learning Python SDK (v2)

Diego STUCCHI 20 Reputation points
2024-11-14T14:50:40.34+00:00

I am working with the Python SDK (v2) of Azure Machine Learning.

I want to launch a training script on a serverless compute by using jobs. Typically, I create and launch a job using the Python command shown below.

The problem I’m facing is that my workspace has the source code, the experiment script, and the utility functions organized in separate folders. However, the command function in the SDK allows uploading only a single folder. I prefer not to restructure my entire codebase to fit AzureML's requirements. Is there a way to upload multiple folders?

from azure.ai.ml import command, Input

job = command(
    inputs=dict(...),
    code='path/to/code/folder',
    command='python main.py --train-data ${{inputs.train_data}} --test-data ${{inputs.test_data}}',
    environment="...",
    experiment_name='...'
)

ml_client.create_or_update(job)
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,004 questions
0 comments No comments
{count} votes

Accepted answer
  1. Pavankumar Purilla 1,645 Reputation points Microsoft Vendor
    2024-11-20T12:54:47.0166667+00:00

    Hi Diego STUCCHI,
    I hope you are doing great!
    Thank you for your response and for editing the original answer. On behalf of Sina Salam , I'm posting the updated answer here in case you'd like to accept it.

    The fact is that the issue arises because the command function in the Azure Machine Learning Python SDK (v2) supports only one folder using the code argument, and you have separate folders for source code, experiment script, and utility functions. Unfortunately, Azure Machine Learning SDK v2 does not currently support uploading multiple folders directly through the command interface. I wanted to avoid any overhead at the same time, but one practical solution is to consolidate these directories by creating a ZIP file containing all your folders. You can then upload this ZIP as the code folder, and Azure ML will automatically unzip it on the compute instance. This avoids altering or restructuring your codebase, and in the below is how you can modify your code:

    1. First, create a ZIP archive of your directories using bash command: zip -r code_archive.zip path/to/source path/to/experiment path/to/utils
    2. Then, use the following Python script:
         from azure.ai.ml import command, Input
         job = command(
             inputs=dict(
                 train_data=Input(type="uri_file", path="path/to/train_data"),
                 test_data=Input(type="uri_file", path="path/to/test_data"),
             ),
             code="path/to/code_archive.zip",  # Use the ZIP file as the code path
             command='python path/to/experiment/main.py --train-data ${{inputs.train_data}} --test-data ${{inputs.test_data}}',
             environment="azureml:<your-environment-name>",
             experiment_name='my_experiment'
         )
         ml_client.create_or_update(job)
      
      My thought is to preserves your codebase structure and works within the current SDK constraints.

    Check more details in the Documentation - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-azureml-sdk and Azure ML Job Submission Best Practices - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-submit-jobs

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Sina Salam 13,371 Reputation points
    2024-11-14T20:54:22.88+00:00

    Hello Diego STUCCHI,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you would like to upload multiple folders for Remote Job via Azure Machine Learning Python SDK (v2).

    When working with the Azure Machine Learning Python SDK (v2) to upload multiple folders for a remote job, there are several methods to consider. Each method has its pros and cons, and it’s important to choose the one that best fits your needs.

    Methods to Upload Multiple Folders

    • Using a ZIP File,
    • Using a Custom Docker Image,
    • Using AzureML Data Assets,
    • Using the code_paths Parameter.

    Best/Optimal Approach is code_paths Parameter.

    The code_paths parameter in the AzureML SDK v2 supports multiple folders directly. This method avoids the hassle of compressing files or building custom containers. This is how you can implement it:

    from azure.ai.ml import command, Input
    job = command(
        inputs=dict(
            train_data=Input(type="uri_file", path="path/to/train_data"),
            test_data=Input(type="uri_file", path="path/to/test_data"),
        ),
        code_paths=[
            "path/to/source",
            "path/to/experiment",
            "path/to/utils"
        ],
        command='python path/to/experiment/main.py --train-data ${{inputs.train_data}} --test-data ${{inputs.test_data}}',
        environment="azureml:<your-environment-name>",
        experiment_name='my_experiment'
    )
    ml_client.create_or_update(job)
    

    Using a ZIP File.

    Should there be any issues with the code_paths parameter, another practical solution is to consolidate your directories into a ZIP file. This method works within the current SDK constraints and preserves your codebase structure. This is how you can do it:

    Use the following bash command to create a ZIP file containing all your folders:

    zip -r code_archive.zip path/to/source path/to/experiment path/to/utils

    Modify Your Python Script: Use the ZIP file as the code path in your script:

    from azure.ai.ml import command, Input
    job = command(
        inputs=dict(
            train_data=Input(type="uri_file", path="path/to/train_data"),
            test_data=Input(type="uri_file", path="path/to/test_data"),
        ),
        code="path/to/code_archive.zip",  # Use the ZIP file as the code path
        command='python path/to/experiment/main.py --train-data ${{inputs.train_data}} --test-data ${{inputs.test_data}}',
        environment="azureml:<your-environment-name>",
        experiment_name='my_experiment'
    )
    ml_client.create_or_update(job)
    

    Check more details in the Documentation - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-azureml-sdk and Azure ML Job Submission Best Practices - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-submit-jobs

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.