Unable to download output of job using mlclient - Azure python sdk v2

Tadikonda Tarun HYD DIWID23 20 Reputation points
2024-01-24T08:26:12.7+00:00

Below is the folder structure in the output of a job and trying to download metrics.json User's image

`

ml_client.jobs.download(name=job_name,output_name='outputs/metrics.json')

` The above does not download the file nor raise any exception. `

ml_client.jobs.download(name=job_name)

` The above one downloads all the file which are in the output. Am I missing anything while trying to download single file.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,334 questions
{count} vote

1 answer

Sort by: Most helpful
  1. dupammi 8,615 Reputation points Microsoft External Staff
    2024-01-24T13:10:17.3933333+00:00

    Hi @Tadikonda Tarun HYD DIWID23 , Thank you for reaching out. I understand you are facing challenges downloading a specific output file from your job using the mlclient in Azure Python SDK v2. It seems there might be a specific behavior with the output_name parameter in the ml_client.jobs.download method. In some cases, using output_name directly may not work as expected, eg. if the Job saves results as URL and the SDK expects string in the output_name. As a workaround, you can try using the all=True parameter along with additional debugging or print statements to identify the exact relative path or full path where the API is attempting to download files from. Here is an example that I tried to repro using the ml_client.jobs.download :

    from azure.ai.ml import MLClient
    from azure.identity import DefaultAzureCredential
    from azureml.core import Workspace, Experiment, Run
    from datetime import datetime
    import os
    
    # Replace with your own values
    subscription_id = 'YOUR_SUBSCRIPTION_ID'
    resource_group = 'YOUR_RESOURCE_GROUP'
    workspace_name = 'YOUR_WORKSPACE_NAME'
    experiment_name = 'YOUR_EXPERIMENT_NAME'
    run_id = 'YOUR_RUN_ID'  
    
    # Get the workspace
    ws = Workspace(subscription_id=subscription_id, resource_group=resource_group, workspace_name=workspace_name)
    
    # Get the experiment
    exp = Experiment(workspace=ws, name=experiment_name)
    
    # Get the run
    run = Run(exp, run_id=run_id)
    
    # Check if the run has completed successfully before downloading files
    if run.get_status() == "Completed":
        # Create a timestamped log directory
        log_dir = f"./logs/{datetime.now().strftime('%Y%m%d-%H%M%S')}"
        os.makedirs(log_dir, exist_ok=True)
        
        # Create MLClient using DefaultAzureCredential
        ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace_name)
        
        # Debugging step 1: List output files and artifacts
        outputs = run.get_file_names()
        print("Output files and artifacts:")
        for output in outputs:
            print(output)
    
        # Debugging step 2: Print debug information before download
        print("Before download:")
        
        # Download all logs and named outputs of the job
        try:
            ml_client.jobs.download(name=run.id, download_path=log_dir, all=True)
            print(f"Files downloaded successfully to: {log_dir}")
        except Exception as e:
            print(f"Error during download: {e}")
    else:
        print("Run is not completed. Cannot download files.")
    
    

    with the above additional debugging steps I added in my repro, along with the "all=true" parameter, I was able to find that it is trying to download from a URL , which is different from output_name, which is a string and not URL. For more details, please refer this joboperations-download

    Output :

    User's image

    This approach will download all files, but it will help you identify the exact file paths. You can then manually filter and use the desired file from the downloaded files. I hope you understand. Thank you.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.