CLI (v2) command component YAML schema

APPLIES TO: Azure CLI ml extension v2 (current)

The source JSON schema can be found at https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json.

Note

The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2 extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.

YAML syntax

Key Type Description Allowed values Default value
$schema string The YAML schema. If you use the Azure Machine Learning VS Code extension to author the YAML file, including $schema at the top of your file enables you to invoke schema and resource completions.
type const The type of component. command command
name string Required. Name of the component. Must start with lowercase letter. Allowed characters are lowercase letters, numbers, and underscore(_). Maximum length is 255 characters.
version string Version of the component. If omitted, Azure Machine Learning will autogenerate a version.
display_name string Display name of the component in the studio UI. Can be non-unique within the workspace.
description string Description of the component.
tags object Dictionary of tags for the component.
is_deterministic boolean This option determines if the component will produce the same output for the same input data. You should usually set this to false for components that load data from external sources, such as importing data from a URL. This is because the data at the URL might change over time. true
command string Required. The command to execute.
code string Local path to the source code directory to be uploaded and used for the component.
environment string or object Required. The environment to use for the component. This value can be either a reference to an existing versioned environment in the workspace or an inline environment specification.

To reference an existing environment, use the azureml:<environment-name>:<environment-version> syntax.

To define an environment inline, follow the Environment schema. Exclude the name and version properties as they aren't supported for inline environments.
distribution object The distribution configuration for distributed training scenarios. One of MpiConfiguration, PyTorchConfiguration, or TensorFlowConfiguration.
resources.instance_count integer The number of nodes to use for the job. 1
inputs object Dictionary of component inputs. The key is a name for the input within the context of the component and the value is the component input definition.

Inputs can be referenced in the command using the ${{ inputs.<input_name> }} expression.
inputs.<input_name> object The component input definition. See Component input for the set of configurable properties.
outputs object Dictionary of component outputs. The key is a name for the output within the context of the component and the value is the component output definition.

Outputs can be referenced in the command using the ${{ outputs.<output_name> }} expression.
outputs.<output_name> object The component output definition. See Component output for the set of configurable properties.

Distribution configurations

MpiConfiguration

Key Type Description Allowed values
type const Required. Distribution type. mpi
process_count_per_instance integer Required. The number of processes per node to launch for the job.

PyTorchConfiguration

Key Type Description Allowed values Default value
type const Required. Distribution type. pytorch
process_count_per_instance integer The number of processes per node to launch for the job. 1

TensorFlowConfiguration

Key Type Description Allowed values Default value
type const Required. Distribution type. tensorflow
worker_count integer The number of workers to launch for the job. Defaults to resources.instance_count.
parameter_server_count integer The number of parameter servers to launch for the job. 0

Component input

Key Type Description Allowed values Default value
type string Required. The type of component input. Learn more about data access number, integer, boolean, string, uri_file, uri_folder, mltable, mlflow_model
description string Description of the input.
default number, integer, boolean, or string The default value for the input.
optional boolean Whether the input is required. If set to true, you need use the command includes optional inputs with $[[]] false
min integer or number The minimum accepted value for the input. This field can only be specified if type field is number or integer.
max integer or number The maximum accepted value for the input. This field can only be specified if type field is number or integer.
enum array The list of allowed values for the input. Only applicable if type field is string.

Component output

Key Type Description Allowed values Default value
type string Required. The type of component output. uri_file, uri_folder, mltable, mlflow_model
description string Description of the output.

Remarks

The az ml component commands can be used for managing Azure Machine Learning components.

Examples

Command component examples are available in the examples GitHub repository. Select examples for are shown below.

Examples are available in the examples GitHub repository. Several are shown below.

YAML: Hello world command component

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command

name: hello_python_world
display_name: Hello_Python_World
version: 1

code: ./src

environment: 
  image: python

command: >-
  python hello.py

YAML: Component with different input types

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: train_data_component_cli
display_name: train_data
description: A example train component
tags:
  author: azureml-sdk-team
version: 9
type: command
inputs:
  training_data: 
    type: uri_folder
  max_epocs:
    type: integer
    optional: true
  learning_rate: 
    type: number
    default: 0.01
    optional: true
  learning_rate_schedule: 
    type: string
    default: time-based
    optional: true
outputs:
  model_output:
    type: uri_folder
code: ./train_src
environment: azureml://registries/azureml/environments/sklearn-1.0/labels/latest
command: >-
  python train.py 
  --training_data ${{inputs.training_data}} 
  $[[--max_epocs ${{inputs.max_epocs}}]]
  $[[--learning_rate ${{inputs.learning_rate}}]]
  $[[--learning_rate_schedule ${{inputs.learning_rate_schedule}}]]
  --model_output ${{outputs.model_output}}

Define optional inputs in command line

When the input is set as optional = true, you need use $[[]] to embrace the command line with inputs. For example $[[--input1 ${{inputs.input1}}]. The command line at runtime may have different inputs.

  • If you're using only specify the required training_data and model_output parameters, the command line will look like:
python train.py --training_data some_input_path --learning_rate 0.01 --learning_rate_schedule time-based --model_output some_output_path

If no value is specified at runtime, learning_rate and learning_rate_schedule will use the default value.

  • If all inputs/outputs provide values during runtime, the command line will look like:
python train.py --training_data some_input_path --max_epocs 10 --learning_rate 0.01 --learning_rate_schedule time-based --model_output some_output_path

Common errors and recommendation

Following are some common errors and corresponding recommended suggestions when you define a component.

Key Errors Recommendation
command 1. Only optional inputs can be in $[[]]
2. Using \ to make a new line isn't supported in command.
3. Inputs or outputs aren't found.
1. Check that all the inputs or outputs used in command are already defined in the inputs and outputs sections, and use the correct format for optional inputs $[[]] or required ones ${{}}.
2. Don't use \ to make a new line.
environment 1. No definition exists for environment {envName} version {envVersion}.
2. No environment exists for name {envName}, version {envVersion}.
3. Couldn't find asset with ID {envAssetId}.
1. Make sure the environment name and version you refer in the component definition exists.
2. You need to specify the version if you refer to a registered environment.
inputs/outputs 1. Inputs/outputs names conflict with system reserved parameters.
2. Duplicated names of inputs or outputs.
1. Don't use any of these reserved parameters as your inputs/outputs name: path, ld_library_path, user, logname, home, pwd, shell.
2. Make sure names of inputs and outputs aren't duplicated.

Next steps