What does "model" parameter do in RunFunction class (v2 SDK)?

Question

What does "model" parameter do in RunFunction class (v2 SDK)?

Rahul Kurian Jacob 80

I am using parallel_run_function from azure.ai.ml.parallel as follows:

parallel_job = parallel_run_function(
	# Other configs
	inputs=dict(
		model=Input(type=AssetTypes.CUSTOM_MODEL)
	),
	task=RunFunction(
        code="./parallel_code/",
        entry_script="parallel_entry_script.py",
        program_arguments="--model_dir ${{inputs.model}}",
        model="${{inputs.model}}",
        append_row_to="${{outputs.job_output_file}}",
        environment=environment.id,
    )
)

My question is what does model parameter do here? I thought it would be like BatchDeployment where passing model parameter with Model class or string id of model with result in it being available through AZUREML_MODEL_DIR environment variable. I tested it but it is NOT the case.

I even passed a non-existent asset (e.g. model="${{inputs.non_exist_model}}") but this does not raise any error unlike in other parameters like program_arguments.

I could not find anything in the docs or https://github.com/Azure/azureml-examples GitHub repo. The only line in class documentation is "The model of the parallel task" which does not explain how or where the model can be retrieved during the job.

santoshkc 15,245 Reputation points Microsoft External Staff Moderator

2024-09-27T05:52:03.53+00:00

Hi @Rahul Kurian Jacob,

We are reaching out to the internal team to get more information related to your query and will get back to you as soon as we have an update.
Rahul Kurian Jacob 80 Reputation points

2024-09-27T15:52:48.16+00:00

Yes, thanks @santoshkc for the confirmation that it is just a metadata parameter. I also mainly took guidance from Use parallel jobs in pipelines for my project. I asked this question as I wanted to know if I was not following the best practice for attaching a registered model by making it one of the program arguments.

You can put this as the answer below, I will mark it as the correct answer.
santoshkc 15,245 Reputation points Microsoft External Staff Moderator

2024-09-30T10:41:05.2333333+00:00

Hi @Rahul Kurian Jacob,

You're welcome! It's great that you're following best practices. I have converted the previous response to an answer in case you'd like to accept the answer. If you have any further questions or concerns, please don't hesitate to ask.

Thank you.

Accepted answer

0 additional answers

Your answer

santoshkc 15,245 Reputation points Microsoft External Staff Moderator

2024-09-27T05:52:03.53+00:00

Hi @Rahul Kurian Jacob,

We are reaching out to the internal team to get more information related to your query and will get back to you as soon as we have an update.
Rahul Kurian Jacob 80 Reputation points

2024-09-27T15:52:48.16+00:00

Yes, thanks @santoshkc for the confirmation that it is just a metadata parameter. I also mainly took guidance from Use parallel jobs in pipelines for my project. I asked this question as I wanted to know if I was not following the best practice for attaching a registered model by making it one of the program arguments.

You can put this as the answer below, I will mark it as the correct answer.
santoshkc 15,245 Reputation points Microsoft External Staff Moderator

2024-09-30T10:41:05.2333333+00:00

Hi @Rahul Kurian Jacob,

You're welcome! It's great that you're following best practices. I have converted the previous response to an answer in case you'd like to accept the answer. If you have any further questions or concerns, please don't hesitate to ask.

Thank you.

Answer 1

Hi @Rahul Kurian Jacob,

Thank you for reaching out to Microsoft Q&A forum!

The model parameter in the RunFunction class specifies a model asset for the parallel job but does not automatically provide it like in BatchDeployment. It doesn't trigger errors for non-existent models, indicating it's more for metadata. You need to manage model loading explicitly in your entry script, as there’s no built-in way to access the model during job execution.

To run parallel jobs in Azure Machine Learning, you can use the Azure CLI or Python SDK. This process involves splitting a task into mini-batches, distributing them across multiple compute nodes, and configuring inputs, data division, and compute resources.

Setup: Ensure you have an Azure ML account, workspace, and necessary SDKs installed.
Define Parallel Job: Create a parallel job step in your pipeline, specifying input data, instance count, mini-batch size, and error handling settings.
Automation: Utilize optional settings for automatic error handling and resource monitoring.
Pipeline Integration: Incorporate the parallel job as a step within your pipeline, binding inputs and outputs to coordinate with other steps.

This approach can significantly reduce execution time and improve efficiency in tasks like model training and batch inferencing.

Please look into: Use parallel jobs in pipelines.

I hope you understand. Thank you.

Share via

What does "model" parameter do in RunFunction class (v2 SDK)?

0 additional answers

Your answer