Specifying AzureML output destination in SDK v2

Question

Hi. I have set up an AzureML pipeline with YAML components using the Python SDK (v2) with an attached blob store. However, it appears that the output destination is handled automatically by AzureML and so I can't specify where on the blob the pipeline writes its output. I want to configure the AzureML pipeline run using ADF, which involves moving some data to the blob, running the AzureML pipeline, and then moving some data from the blob to somewhere else. The trouble is that ADF doesn't get access to the AzureML output directory, and so it won't know where to look for the output file.

I have tried to pass the output directory as an input rather than an output so that I can explicitly state where this should go. The directory, however, gets mounted as read only (quite sensibly by design, I trust) so that doesn't work. So I'm kind of running out of options.

Is there any way for me specify the output path for an Azure ML SDK v2 pipeline in a similar way to how I would specify an input path? Alternatively, is there another way of solving this particular predicament of mine?

I have looked through the notebooks (e.g. https://github.com/Azure/azureml-examples/blob/8a4070f55593c9641083784283b773f4f20955dd/sdk/jobs/pipelines/1a_pipeline_with_components_from_yaml/pipeline_with_components_from_yaml.ipynb) and I can't find an example where people explicitly control the output destination (which seems odd).

Thoughts?

Answer

Any updates on this?
I have the same problem, it seems you can specify the path in the output like
outputs={
"output_path": Output(type="uri_folder", mode="rw_mount", path=),
}
It doesnt throw an error, however the path is ignored anyways...

Answer

I also had the requirement to set a custom path for the output of a pipeline artifact. I set up a mixed approach of yaml definition and sdk2. I defined my components via yaml and for pipeline creation I went for the sdk2. My yaml definition of the score_component looks as follows:

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command

name: score_example_model
display_name: Score
description: Scores the data from the input file (parquet) with a registered model
version: 0.0.1
inputs:
  input_data:
    type: uri_folder
  model:
    type: mlflow_model
outputs:
  output_dir: 
    type: uri_folder
code: score_file.py
environment: azureml:example-env:0.1.1
command: >-
  python score_file.py
  --input_data ${{inputs.input_data}}
  --model ${{inputs.model}}
  --output_dir ${{outputs.output_dir}}

As you can see, there is no output path defined. I believe that this is good practice, though. You can explicitly pass an Output Object with your custom path within your pipeline function like this:


    ### Define your Pipeline ###
    @dsl.pipeline(
        compute=args.compute_target,
        description="Inference pipeline for batch scoring file input"
    )
    def example_inference_pipeline(
        inference_job_input_data,
        inference_job_feat_dict,
        inference_job_model
    ):

        build_feat_job = build_feat_component(
            input_data=inference_job_input_data,
            feat_dict_raw=inference_job_feat_dict,
            val_only=True
        )
        score_job = score_component(
            input_data=build_feat_job.outputs.processed_data,
            model=inference_job_model,
        )
        # Set custom output path for output_dir
        score_job.outputs.output_dir = Output(type="uri_folder", path=args.output_dir,  mode="rw_mount")

        return {
            "inference_job_output_dir": score_job.outputs.output_dir
        }

Share via

Specifying AzureML output destination in SDK v2

2 answers