Hello BeatriceHammond-9974,
Welcome to Microsoft Q&A and Thank you for providing the detailed explanation of your scenario.
In Azure Machine Learning pipelines (v2 schema), the automatic generation of unique output paths such as the /run-id/outputs/ structure is only triggered when no explicit path is defined for the output. When you specify a full datastore URI like:
path: azureml://datastores/my-datastore/paths/my-folder
Azure ML treats that as a fixed destination, so the service writes directly to that location and does not generate the automatic run-based subfolder structure.
You also attempted the data.datastore syntax:
data:
However, this field is not supported in the current stable YAML schema, which is why you encountered a validation error. At the moment, there is no supported way to attach a different datastore to a single output property while still allowing Azure ML to generate its automatic run-based folder structure.
Recommended workarounds
1. Override the default datastore at the job level
One common approach is to override the datastore at the job or component level. In your pipeline YAML you can configure:
jobs:
Then define your outputs without specifying a path:
outputs:
With this configuration, Azure ML will still generate the automatic run-based folder structure, but it will do so inside the specified custom datastore.
If your pipeline needs outputs stored in multiple different datastores, you can split those outputs into separate jobs, where each job sets its own default_datastore.
2. Post-run copy or transfer step
Another commonly used pattern is:
Let Azure ML write outputs to the default datastore so the service generates the run-specific path automatically.
- Add a final pipeline step that copies or moves the artifacts from the generated path to the desired datastore and folder structure.
This approach preserves the native Azure ML run organization while still allowing you to store the final artifacts in different datastores.
Currently Auto-generated run folders are created only when path is omitted, The pipeline schema does not support specifying a datastore at the individual output level.
To use different datastores, you typically either override the datastore at the job level or copy artifacts after the run.
Please refer this
Define CLI v2 components (shows settings.default_datastore): https://docs.microsoft.com/azure/machine-learning/reference-yaml-component-command
Pipelines v2 overview (jobs.settings section): https://docs.microsoft.com/azure/machine-learning/how-to-create-component-pipelines-cli
Datastore concepts (what “default_datastore” affects): https://docs.microsoft.com/azure/machine-learning/concept-data?view=azureml-api-2
DataTransferStep for copying between datastores: https://docs.microsoft.com/python/api/azureml-pipeline-steps/azureml.pipeline.steps.datatransferstep
I Hope this helps. Do let me know if you have any further queries.
Thank you!