Share via

Setting a datastore at Job Level in an Azure ML Pipeline

BeatriceHammond-9974 10 Reputation points
2026-03-11T12:11:22.01+00:00

I can see the option to configure a default datastore at the pipeline level using the settings section. However, I need to specify different datastores for specific job outputs within the same pipeline while preserving Azure ML's auto-generated unique path structure (run IDs, timestamps, etc.).

What I've tried:

  1. Using path with full datastore URI:
    
       outputs:
    
         model_output:
    
           type: uri_folder
    
           path: azureml://datastores/my-datastore/paths/my-folder
    
           mode: upload
    
    
    Result: Works, but writes directly to the specified path without auto-generating unique subfolders.
  2. Using data.datastore syntax (from preview documentation):
    
       outputs:
    
         model_output:
    
           type: uri_folder
    
           data:
    
             datastore: azureml:my-datastore
    
           mode: upload
    
    
    Result: Validation error - data field not recognized in current schema.

Question: Is there a way to specify a custom datastore for individual job outputs while still having Azure ML auto-generate unique paths (similar to using name with the default datastore)? If not currently supported, are there any recommended workarounds?

Azure Machine Learning

2 answers

Sort by: Most helpful
  1. Manas Mohanty 16,110 Reputation points Microsoft External Staff Moderator
    2026-03-25T20:43:23.62+00:00

    Hi BeatriceHammond-9974,

    Hope you found the insights in below answer helpful.

    Wanted to emphasize that "You cannot set a datastore per individual output while keeping auto-generated paths. What is supported is overriding the default datastore at the job (or component) level."

    We can use multiple different data stores to save output from different jobs.

    jobs:
      job_a:
        settings:
          default_datastore: azureml:datastoreA
      job_b:
        settings:
          default_datastore: azureml:datastoreB
    
    
    

    Reference used :https://learn.microsoft.com/en-us/azure/machine-learning/how-to-read-write-data-v2?view=azureml-api-2&tabs=python#paths

    Thank you for your inputs.


  2. SRILAKSHMI C 16,785 Reputation points Microsoft External Staff Moderator
    2026-03-11T15:32:47.81+00:00

    Hello BeatriceHammond-9974,

    Welcome to Microsoft Q&A and Thank you for providing the detailed explanation of your scenario.

    In Azure Machine Learning pipelines (v2 schema), the automatic generation of unique output paths such as the /run-id/outputs/ structure is only triggered when no explicit path is defined for the output. When you specify a full datastore URI like:

    path: azureml://datastores/my-datastore/paths/my-folder
    

    Azure ML treats that as a fixed destination, so the service writes directly to that location and does not generate the automatic run-based subfolder structure.

    You also attempted the data.datastore syntax:

    data:
    

    However, this field is not supported in the current stable YAML schema, which is why you encountered a validation error. At the moment, there is no supported way to attach a different datastore to a single output property while still allowing Azure ML to generate its automatic run-based folder structure.

    Recommended workarounds

    1. Override the default datastore at the job level

    One common approach is to override the datastore at the job or component level. In your pipeline YAML you can configure:

    jobs:
    

    Then define your outputs without specifying a path:

    outputs:
    

    With this configuration, Azure ML will still generate the automatic run-based folder structure, but it will do so inside the specified custom datastore.

    If your pipeline needs outputs stored in multiple different datastores, you can split those outputs into separate jobs, where each job sets its own default_datastore.

    2. Post-run copy or transfer step

    Another commonly used pattern is:

    Let Azure ML write outputs to the default datastore so the service generates the run-specific path automatically.

    1. Add a final pipeline step that copies or moves the artifacts from the generated path to the desired datastore and folder structure.

    This approach preserves the native Azure ML run organization while still allowing you to store the final artifacts in different datastores.

    Currently Auto-generated run folders are created only when path is omitted, The pipeline schema does not support specifying a datastore at the individual output level.

    To use different datastores, you typically either override the datastore at the job level or copy artifacts after the run.

    Please refer this

    Define CLI v2 components (shows settings.default_datastore): https://docs.microsoft.com/azure/machine-learning/reference-yaml-component-command

    Pipelines v2 overview (jobs.settings section): https://docs.microsoft.com/azure/machine-learning/how-to-create-component-pipelines-cli

    Datastore concepts (what “default_datastore” affects): https://docs.microsoft.com/azure/machine-learning/concept-data?view=azureml-api-2

    DataTransferStep for copying between datastores: https://docs.microsoft.com/python/api/azureml-pipeline-steps/azureml.pipeline.steps.datatransferstep

    I Hope this helps. Do let me know if you have any further queries.

    Thank you!

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.