az ml data

Note

This reference is part of the ml extension for the Azure CLI (version 2.15.0 or higher). The extension will automatically install the first time you run an az ml data command. Learn more about extensions.

Manage Azure ML data assets.

Azure ML data assets are references to file(s) in your storage services or public URLs along with any corresponding metadata. They are not copies of your data. You can use these data assets to access relevant data during model training and mount or download the referenced data to your compute target.

Commands

az ml data archive

Archive a data asset.

az ml data create

Create a data asset.

az ml data list

List data assets in a workspace.

az ml data restore

Restore an archived data asset.

az ml data show

Shows details for a data asset.

az ml data update

Update a data asset.

az ml data archive

Archive a data asset.

Archiving a data asset will hide it by default from list queries (az ml data list). You can still continue to reference and use an archived data asset in your workflows. You can archive either a data asset container or a specific data asset version. Archiving a data asset container will archive all versions of the data asset under that given name. You can restore an archived data asset using az ml data restore. If the entire data asset container is archived, you cannot restore individual versions of the data asset - you will need to restore the data asset container.

az ml data archive --name
                   --resource-group
                   --workspace-name
                   [--label]
                   [--version]

Examples

Archive an data asset container (archives all versions of that data asset)

az ml data archive --name my-env --resource-group my-resource-group --workspace-name my-workspace

Archive a specific data asset version

az ml data archive --name my-env --version 1 --resource-group my-resource-group --workspace-name my-workspace

Required Parameters

--name -n

Name of the data asset.

--resource-group -g

Name of resource group. You can configure the default group using az configure --defaults group=<name>.

--workspace-name -w

Name of the Azure ML workspace. You can configure the default group using az configure --defaults workspace=<name>.

Optional Parameters

--label -l

Label of the data asset.

--version -v

Version of the data asset.

az ml data create

Create a data asset.

Data assets can be defined from files on your local machine or as references to files in cloud storage. The created data asset will be tracked in the workspace under the specified name and version.

To create a data asset from file(s) on your local machine, specify the 'path' field in your YAML config. Azure ML will upload these file(s) to the blob container that backs the workspace's default datastore (named 'workspaceblobstore'). The created data asset will then point to that uploaded data.

To create a data asset that references file(s) in cloud storage, specify the 'path' to the file(s) in storage in your YAML config.

You can also create a data asset directly from a storage URL or public URL. To do so, specify the URL to the 'path' field in your YAML config.

az ml data create --resource-group
                  --workspace-name
                  [--datastore]
                  [--description]
                  [--file]
                  [--name]
                  [--path]
                  [--set]
                  [--skip-validation]
                  [--type {mltable, uri_file, uri_folder}]
                  [--version]

Examples

Create a data asset from a YAML specification file

az ml data create --file data.yml --resource-group my-resource-group --workspace-name my-workspace

Create a data asset without using a YAML specification file

az ml data create --name my-data --version 1 --path ./my-data.csv --resource-group my-resource-group --workspace-name my-workspace

Required Parameters

--resource-group -g

Name of resource group. You can configure the default group using az configure --defaults group=<name>.

--workspace-name -w

Name of the Azure ML workspace. You can configure the default group using az configure --defaults workspace=<name>.

Optional Parameters

--datastore

The datastore to upload the local artifact to.

--description -d

Description of the data asset.

--file -f

Local path to the YAML file containing the Azure ML data specification. The YAML reference docs for data can be found at: https://aka.ms/ml-cli-v2-data-yaml-reference.

--name -n

Name of the data asset.

--path -p

Path to the data asset, can be local or remote.

--set

Update an object by specifying a property path and value to set. Example: --set property1.property2=.

--skip-validation

Skip validation of MLTable metadata when type is MLTable.

--type -t

Type of the data asset.

accepted values: mltable, uri_file, uri_folder
--version -v

Version of the data asset.

az ml data list

List data assets in a workspace.

az ml data list --resource-group
                --workspace-name
                [--archived-only]
                [--include-archived]
                [--max-results]
                [--name]

Examples

List all the data assets in a workspace

az ml data list --resource-group my-resource-group --workspace-name my-workspace

List all the data asset versions for the specified name in a workspace

az ml data list --name my-data --resource-group my-resource-group --workspace-name my-workspace

List all the data assets in a workspace using --query argument to execute a JMESPath query on the results of commands.

az ml data list --query "[].{Name:name}" --output table --resource-group my-resource-group --workspace-name my-workspace

Required Parameters

--resource-group -g

Name of resource group. You can configure the default group using az configure --defaults group=<name>.

--workspace-name -w

Name of the Azure ML workspace. You can configure the default group using az configure --defaults workspace=<name>.

Optional Parameters

--archived-only

List archived data assets only.

--include-archived

List archived data assets and active data assets.

--max-results -r

Max number of results to return.

--name -n

Name of the data asset. If provided, all the data versions under this name will be returned.

az ml data restore

Restore an archived data asset.

When an archived data asset is restored, it will no longer be hidden from list queries (az ml data list). If an entire data asset container is archived, you can restore that archived container. This will restore all versions of the data asset under that given name. You cannot restore only a specific data asset version if the entire data asset container is archived - you will need to restore the entire container. If only an individual data asset version was archived, you can restore that specific version.

az ml data restore --name
                   --resource-group
                   --workspace-name
                   [--label]
                   [--version]

Examples

Restore an archived data asset container (restores all versions of that data asset)

az ml data restore --name my-env --resource-group my-resource-group --workspace-name my-workspace

Restore a specific archived data asset version

az ml data restore --name my-env --version 1 --resource-group my-resource-group --workspace-name my-workspace

Required Parameters

--name -n

Name of the data asset.

--resource-group -g

Name of resource group. You can configure the default group using az configure --defaults group=<name>.

--workspace-name -w

Name of the Azure ML workspace. You can configure the default group using az configure --defaults workspace=<name>.

Optional Parameters

--label -l

Label of the data asset.

--version -v

Version of the data asset.

az ml data show

Shows details for a data asset.

az ml data show --name
                --resource-group
                --workspace-name
                [--label]
                [--version]

Examples

Show details for a data asset with the specified name and version

az ml data show --name my-data --version 1 --resource-group my-resource-group --workspace-name my-workspace

Required Parameters

--name -n

Name of the data asset.

--resource-group -g

Name of resource group. You can configure the default group using az configure --defaults group=<name>.

--workspace-name -w

Name of the Azure ML workspace. You can configure the default group using az configure --defaults workspace=<name>.

Optional Parameters

--label -l

Label of the data asset.

--version -v

Version of the data asset.

az ml data update

Update a data asset.

Only the 'description' and 'tags' properties can be updated.

az ml data update --name
                  --resource-group
                  --version
                  --workspace-name
                  [--add]
                  [--force-string]
                  [--label]
                  [--remove]
                  [--set]

Required Parameters

--name -n

Name of the data asset.

--resource-group -g

Name of resource group. You can configure the default group using az configure --defaults group=<name>.

--version -v

Version of the data asset.

--workspace-name -w

Name of the Azure ML workspace. You can configure the default group using az configure --defaults workspace=<name>.

Optional Parameters

--add

Add an object to a list of objects by specifying a path and key value pairs. Example: --add property.listProperty <key=value, string or JSON string>.

--force-string

When using 'set' or 'add', preserve string literals instead of attempting to convert to JSON.

--label -l

Label of the data asset.

--remove

Remove a property or an element from a list. Example: --remove property.list OR --remove propertyToRemove.

--set

Update an object by specifying a property path and value to set. Example: --set property1.property2=.