Providing data paths to Azure Machine Learning Pipeline activity in ADF

Linus Östlund 20 Reputation points
2024-08-27T13:53:01.64+00:00

I try to orchestrate AML pipelines from ADF. I do have a pipeline setup within AML, which is auto-generated, together with an auto-generated scoring script (provided by AML):

User's image

I use the "Azure Machine Learning Pipeline" activity from ADF. It is correctly set up, and running the ADF pipeline gives me an error within AML:

UserError: ``Data set node ed3df179 references parameter dataset_param which doesn't have a specified value or a default value.

This makes sense, since I haven't configured any paths for my data. The ADF pipeline activity has some options:

User's image

Searching on my own, I find vague descriptions of these Data Path Assignments. From this doc page I can find some properties:

Properties
dataPathAssignments: Dictionary used for changing datapaths in Azure Machine learning. Enables the switching of datapaths

My data exists within the default workspace blob in AML. My question is, how do I provide my AML pipeline with data from ADF using Machine Learning Data Path Assignments?

Regards,
L

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,842 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,555 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Sina Salam 9,881 Reputation points
    2024-08-27T16:46:03.8366667+00:00

    Hello Linus Östlund,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    Regarding your explanations, I understand that you would want to provide data paths to Azure Machine Learning Pipeline activity in Azure Data Factory (ADF).

    To resolve this, you will need to use Machine Learning Data Path Assignments to provide your Azure Machine Learning (AML) pipeline with data from Azure Data Factory (ADF).

    • Create a DataPath object in your AML pipeline script to reference the data in your default workspace blob. For an example in Python:
       from azureml.core import Workspace, Datastore, Dataset
       from azureml.data.datapath import DataPath
       ws = Workspace.from_config()
       datastore = ws.get_default_datastore()
       data_path = DataPath(datastore, 'path/to/your/data')
    
    • The, you will use PipelineParameter to make the data path configurable. For an example:
       from azureml.pipeline.core import PipelineParameter
       data_path_param = PipelineParameter(name="data_path_param", default_value=data_path)
    
    • After the above, you will then use the dataPathAssignments property to assign the data path parameter in your ADF pipeline. By navigate to the "Azure Machine Learning Pipeline" in the ADF UI activity settings, then specify the parameter name the dataPathAssignments, and the corresponding data path.
     {
         "dataPathAssignments": {
           "data_path_param": {
             "DataStoreName": "your_datastore_name",
             "RelativePath": "path/to/your/data"
           }
         }
       }
    

    Finally, try to run your pipeline, the data path assignment will pass the correct data path to the AML pipeline. For more reading and steps, check the additional resources by the right side of this page and my special links here: https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-showcasing-datapath-and-pipelineparameter.ipynb

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.

    ** Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful ** so that others in the community facing similar issues can easily find the solution.

    Best Regards,

    Sina Salam


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.