Specifying AzureML output destination in SDK v2

SH 56 Reputation points
2022-08-18T12:17:44.22+00:00

Hi. I have set up an AzureML pipeline with YAML components using the Python SDK (v2) with an attached blob store. However, it appears that the output destination is handled automatically by AzureML and so I can't specify where on the blob the pipeline writes its output. I want to configure the AzureML pipeline run using ADF, which involves moving some data to the blob, running the AzureML pipeline, and then moving some data from the blob to somewhere else. The trouble is that ADF doesn't get access to the AzureML output directory, and so it won't know where to look for the output file.

I have tried to pass the output directory as an input rather than an output so that I can explicitly state where this should go. The directory, however, gets mounted as read only (quite sensibly by design, I trust) so that doesn't work. So I'm kind of running out of options.

Is there any way for me specify the output path for an Azure ML SDK v2 pipeline in a similar way to how I would specify an input path? Alternatively, is there another way of solving this particular predicament of mine?

I have looked through the notebooks (e.g. https://github.com/Azure/azureml-examples/blob/8a4070f55593c9641083784283b773f4f20955dd/sdk/jobs/pipelines/1a_pipeline_with_components_from_yaml/pipeline_with_components_from_yaml.ipynb) and I can't find an example where people explicitly control the output destination (which seems odd).

Thoughts?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,658 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Steiner, Thomas 11 Reputation points
    2022-10-10T10:39:09.38+00:00

    Any updates on this?
    I have the same problem, it seems you can specify the path in the output like
    outputs={
    "output_path": Output(type="uri_folder", mode="rw_mount", path=<path>),
    }
    It doesnt throw an error, however the path is ignored anyways...

    2 people found this answer helpful.
    0 comments No comments

  2. Dominik Leuzinger 0 Reputation points
    2023-04-21T11:41:33.82+00:00

    I also had the requirement to set a custom path for the output of a pipeline artifact. I set up a mixed approach of yaml definition and sdk2. I defined my components via yaml and for pipeline creation I went for the sdk2. My yaml definition of the score_component looks as follows:

    $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
    type: command
    
    name: score_example_model
    display_name: Score
    description: Scores the data from the input file (parquet) with a registered model
    version: 0.0.1
    inputs:
      input_data:
        type: uri_folder
      model:
        type: mlflow_model
    outputs:
      output_dir: 
        type: uri_folder
    code: score_file.py
    environment: azureml:example-env:0.1.1
    command: >-
      python score_file.py
      --input_data ${{inputs.input_data}}
      --model ${{inputs.model}}
      --output_dir ${{outputs.output_dir}}
    

    As you can see, there is no output path defined. I believe that this is good practice, though. You can explicitly pass an Output Object with your custom path within your pipeline function like this:

    
        ### Define your Pipeline ###
        @dsl.pipeline(
            compute=args.compute_target,
            description="Inference pipeline for batch scoring file input"
        )
        def example_inference_pipeline(
            inference_job_input_data,
            inference_job_feat_dict,
            inference_job_model
        ):
    
            build_feat_job = build_feat_component(
                input_data=inference_job_input_data,
                feat_dict_raw=inference_job_feat_dict,
                val_only=True
            )
            score_job = score_component(
                input_data=build_feat_job.outputs.processed_data,
                model=inference_job_model,
            )
            # Set custom output path for output_dir
            score_job.outputs.output_dir = Output(type="uri_folder", path=args.output_dir,  mode="rw_mount")
    
            return {
                "inference_job_output_dir": score_job.outputs.output_dir
            }
    
    0 comments No comments