CLI (v2) command component YAML schema

Grein
08/29/2024

APPLIES TO: Azure CLI ml extension v2 (current)

The source JSON schema can be found at https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json.

Note

The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2 extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.

YAML syntax

Key	Type	Description	Allowed values	Default value
`$schema`	string	The YAML schema. If you use the Azure Machine Learning VS Code extension to author the YAML file, including `$schema` at the top of your file enables you to invoke schema and resource completions.
`type`	const	The type of component.	`command`	`command`
`name`	string	Required. Name of the component. Must start with lowercase letter. Allowed characters are lowercase letters, numbers, and underscore(_). Maximum length is 255 characters.
`version`	string	Version of the component. If omitted, Azure Machine Learning will autogenerate a version.
`display_name`	string	Display name of the component in the studio UI. Can be non-unique within the workspace.
`description`	string	Description of the component.
`tags`	object	Dictionary of tags for the component.
`is_deterministic`	boolean	This option determines if the component will produce the same output for the same input data. You should usually set this to `false` for components that load data from external sources, such as importing data from a URL. This is because the data at the URL might change over time.		`true`
`command`	string	Required. The command to execute.
`code`	string	Local path to the source code directory to be uploaded and used for the component.
`environment`	string or object	Required. The environment to use for the component. This value can be either a reference to an existing versioned environment in the workspace or an inline environment specification. To reference an existing custom environment, use the `azureml:<environment-name>:<environment-version>` syntax. To reference a curated environment, use the `azureml://registries/azureml/environment/<curated-environment-name>/versions/<version-number>` syntax. For more information on how to reference environments see How to Manage Environments To define an environment inline, follow the Environment schema. Exclude the `name` and `version` properties as they aren't supported for inline environments.
`distribution`	object	The distribution configuration for distributed training scenarios. One of MpiConfiguration, PyTorchConfiguration, or TensorFlowConfiguration.
`resources.instance_count`	integer	The number of nodes to use for the job.		`1`
`inputs`	object	Dictionary of component inputs. The key is a name for the input within the context of the component and the value is the component input definition. Inputs can be referenced in the `command` using the `${{ inputs.<input_name> }}` expression.
`inputs.<input_name>`	object	The component input definition. See Component input for the set of configurable properties.
`outputs`	object	Dictionary of component outputs. The key is a name for the output within the context of the component and the value is the component output definition. Outputs can be referenced in the `command` using the `${{ outputs.<output_name> }}` expression.
`outputs.<output_name>`	object	The component output definition. See Component output for the set of configurable properties.

Distribution configurations

MpiConfiguration

Key	Type	Description	Allowed values
`type`	const	Required. Distribution type.	`mpi`
`process_count_per_instance`	integer	Required. The number of processes per node to launch for the job.

PyTorchConfiguration

Key	Type	Description	Allowed values	Default value
`type`	const	Required. Distribution type.	`pytorch`
`process_count_per_instance`	integer	The number of processes per node to launch for the job.		`1`

TensorFlowConfiguration

Key	Type	Description	Allowed values	Default value
`type`	const	Required. Distribution type.	`tensorflow`
`worker_count`	integer	The number of workers to launch for the job.		Defaults to `resources.instance_count`.
`parameter_server_count`	integer	The number of parameter servers to launch for the job.		`0`

Component input

Key	Type	Description	Allowed values	Default value
`type`	string	Required. The type of component input. Learn more about data access	`number`, `integer`, `boolean`, `string`, `uri_file`, `uri_folder`, `mltable`, `mlflow_model`
`description`	string	Description of the input.
`default`	number, integer, boolean, or string	The default value for the input.
`optional`	boolean	Whether the input is required. If set to `true`, you need use the command includes optional inputs with `$[[]]`		`false`
`min`	integer or number	The minimum accepted value for the input. This field can only be specified if `type` field is `number` or `integer`.
`max`	integer or number	The maximum accepted value for the input. This field can only be specified if `type` field is `number` or `integer`.
`enum`	array	The list of allowed values for the input. Only applicable if `type` field is `string`.

Component output

Key	Type	Description	Allowed values	Default value
`type`	string	Required. The type of component output.	`uri_file`, `uri_folder`, `mltable`, `mlflow_model`
`description`	string	Description of the output.

Remarks

The az ml component commands can be used for managing Azure Machine Learning components.

Examples

Command component examples are available in the examples GitHub repository. Select examples for are shown below.

Examples are available in the examples GitHub repository. Several are shown below.

YAML: Hello world command component

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command

name: hello_python_world
display_name: Hello_Python_World
version: 1

code: ./src

environment: 
  image: python

command: >-
  python hello.py

YAML: Component with different input types

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: train_data_component_cli
display_name: train_data
description: A example train component
tags:
  author: azureml-sdk-team
type: command
inputs:
  training_data: 
    type: uri_folder
  max_epocs:
    type: integer
    optional: true
  learning_rate: 
    type: number
    default: 0.01
    optional: true
  learning_rate_schedule: 
    type: string
    default: time-based
    optional: true
outputs:
  model_output:
    type: uri_folder
code: ./train_src
environment: azureml://registries/azureml/environments/sklearn-1.5/labels/latest
command: >-
  python train.py 
  --training_data ${{inputs.training_data}} 
  $[[--max_epocs ${{inputs.max_epocs}}]]
  $[[--learning_rate ${{inputs.learning_rate}}]]
  $[[--learning_rate_schedule ${{inputs.learning_rate_schedule}}]]
  --model_output ${{outputs.model_output}}

Define optional inputs in command line

When the input is set as optional = true, you need use $[[]] to embrace the command line with inputs. For example $[[--input1 ${{inputs.input1}}]. The command line at runtime may have different inputs.

If you're using only specify the required training_data and model_output parameters, the command line will look like:

python train.py --training_data some_input_path --learning_rate 0.01 --learning_rate_schedule time-based --model_output some_output_path

If no value is specified at runtime, learning_rate and learning_rate_schedule will use the default value.

If all inputs/outputs provide values during runtime, the command line will look like:

python train.py --training_data some_input_path --max_epocs 10 --learning_rate 0.01 --learning_rate_schedule time-based --model_output some_output_path

Common errors and recommendation

Following are some common errors and corresponding recommended suggestions when you define a component.

Key	Errors	Recommendation
command	1. Only optional inputs can be in `$[[]]` 2. Using `\` to make a new line isn't supported in command. 3. Inputs or outputs aren't found.	1. Check that all the inputs or outputs used in command are already defined in the `inputs` and `outputs` sections, and use the correct format for optional inputs `$[[]]` or required ones `${{}}`. 2. Don't use `\` to make a new line.
environment	1. No definition exists for environment `{envName}` version `{envVersion}`. 2. No environment exists for name `{envName}`, version `{envVersion}`. 3. Couldn't find asset with ID `{envAssetId}`.	1. Make sure the environment name and version you refer in the component definition exists. 2. You need to specify the version if you refer to a registered environment.
inputs/outputs	1. Inputs/outputs names conflict with system reserved parameters. 2. Duplicated names of inputs or outputs.	1. Don't use any of these reserved parameters as your inputs/outputs name: `path`, `ld_library_path`, `user`, `logname`, `home`, `pwd`, `shell`. 2. Make sure names of inputs and outputs aren't duplicated.

Deila með