How to tag job outputs data assets directly

Question

Hi,

I follow the documentation here: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-read-write-data-v2?view=azureml-api-2&tabs=python section: Write data from your Azure Machine Learning job to Azure Storage to output a named data asset of a job.

Therefore my job output definition roughly looks like this:

outputs = { 
"output_data": Output(
	type=data_type, 
	path=output_path, 
	mode=output_mode,
    name=my_name,
    # HOW to add tags? tags={"mytag": "test"} did not work
) }

However, i want to automatically set tags now (and perhaps in future properties) to the named data asset.

What is the best way of doing this?

Thanks a lot

Accepted Answer

I am not expert but this is what I found in different forums, start by running your job to get the output :


from azure.ai.ml import MLClient

from azure.ai.ml.entities import Job

from azure.ai.ml.entities import Output

# Define your job

outputs = { 

    "output_data": Output(

        type="uri_folder",  # replace with your data_type

        path="azureml://datastores/workspaceblobstore/paths/output-path",  # replace with your output_path

        mode="mount",  # replace with your output_mode

        name="my_output_data"

    ) 

}

# Create a job (replace this part with your actual job definition)

job = Job(

    # job properties

    outputs=outputs

)

# Get a handle to your MLClient

ml_client = MLClient.from_config()

# Submit the job

returned_job = ml_client.jobs.create_or_update(job)

After your job completes, you can fetch the output data asset and add tags to it, using the ml_client.data.get method. So you need to update the data asset with the desired tags and save the updated data asset using the ml_client.data.update method.


from azure.ai.ml.entities import Data

# Get the data asset name from the job outputs

output_data_name = returned_job.outputs["output_data"].name

# Fetch the data asset

data_asset = ml_client.data.get(name=output_data_name)

# Update the data asset with tags

data_asset.tags = {"mytag": "test"}

# Update the data asset in the workspace

ml_client.data.update(data_asset)

Share via

How to tag job outputs data assets directly

0 additional answers

Your answer