CLI (v2) data YAML schema

APPLIES TO: Azure CLI ml extension v2 (current)

The source JSON schema can be found at https://azuremlschemas.azureedge.net/latest/data.schema.json.

Note

The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2 extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.

YAML syntax

Key Type Description Allowed values Default value
$schema string The YAML schema. If you use the Azure Machine Learning Visual Studio Code extension to author the YAML file, you can invoke schema and resource completions if you include $schema at the top of your file.
name string Required. The data asset name.
version string The dataset version. If omitted, Azure Machine Learning autogenerates a version.
description string The data asset description.
tags object The datastore tag dictionary.
type string The data asset type. Specify uri_file for data that points to a single file source, or uri_folder for data that points to a folder source. uri_file, uri_folder uri_folder
path string Either a local path to the data source file or folder, or the URI of a cloud path to the data source file or folder. Ensure that the source provided here is compatible with the type specified.

Supported URI types are azureml, https, wasbs, abfss, and adl. To use the azureml:// URI format, see Core yaml syntax.

Remarks

The az ml data commands can be used for managing Azure Machine Learning data assets.

Examples

Examples are available in the examples GitHub repository. Several are shown:

YAML: datastore file

$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
name: cloud-file-example
description: Data asset created from file in cloud.
type: uri_file
path: azureml://datastores/workspaceblobstore/paths/example-data/titanic.csv

YAML: datastore folder

$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
name: cloud-folder-example
description: Data asset created from folder in cloud.
type: uri_folder
path: azureml://datastores/workspaceblobstore/paths/example-data/

YAML: https file

$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
name: cloud-file-https-example
description: Data asset created from a file in cloud using https URL.
type: uri_file
path: https://account-name.blob.core.windows.net/container-name/example-data/titanic.csv

YAML: https folder

$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
name: cloud-folder-https-example
description: Dataset created from folder in cloud using https URL.
type: uri_folder
path: https://account-name.blob.core.windows.net/container-name/example-data/

YAML: wasbs file

$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
name: cloud-file-wasbs-example
description: Data asset created from a file in cloud using wasbs URL.
type: uri_file
path: wasbs://account-name.blob.core.windows.net/container-name/example-data/titanic.csv

YAML: wasbs folder

$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
name: cloud-folder-wasbs-example
description: Data asset created from folder in cloud using wasbs URL.
type: uri_folder
path: wasbs://account-name.blob.core.windows.net/container-name/example-data/

YAML: local file

$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
name: local-file-example-titanic
description: Data asset created from local file.
type: uri_file
path: sample-data/titanic.csv

YAML: local folder

$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
name: local-folder-example-titanic
description: Dataset created from local folder.
type: uri_folder
path: sample-data/

Next steps