ml Package

Packages

automl

Contains automated machine learning classes for Azure Machine Learning SDKv2.

Main areas include managing AutoML tasks.

constants

This package defines constants used in Azure Machine Learning SDKv2.

data_transfer
dsl
entities

Contains entities and SDK objects for Azure Machine Learning SDKv2.

Main areas include managing compute targets, creating/managing workspaces and jobs, and submitting/accessing model, runs and run output/logging etc.

identity

Contains Identity Configuration for Azure Machine Learning SDKv2.

operations

Contains supported operations for Azure Machine Learning SDKv2.

Operations are classes contain logic to interact with backend services, usually auto generated operations call.

parallel
sweep

Modules

exceptions

Contains exception module in Azure Machine Learning SDKv2.

This includes enums and classes for exceptions.

Classes

Input

Initialize an Input object.

MLClient

A client class to interact with Azure ML services.

Use this client to manage Azure ML resources such as workspaces, jobs, models, and so on.

MpiDistribution

MPI distribution configuration.

Output

Define an output.

PyTorchDistribution

PyTorch distribution configuration.

RayDistribution

Note

This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.

Ray distribution configuration.

TensorFlowDistribution

TensorFlow distribution configuration.

Functions

command

Creates a Command object which can be used inside a dsl.pipeline function or used as a standalone Command job.

command(*, name: str | None = None, description: str | None = None, tags: Dict | None = None, properties: Dict | None = None, display_name: str | None = None, command: str | None = None, experiment_name: str | None = None, environment: str | Environment | None = None, environment_variables: Dict | None = None, distribution: Dict | MpiDistribution | TensorFlowDistribution | PyTorchDistribution | RayDistribution | DistributionConfiguration | None = None, compute: str | None = None, inputs: Dict | None = None, outputs: Dict | None = None, instance_count: int | None = None, instance_type: str | None = None, locations: List[str] | None = None, docker_args: str | None = None, shm_size: str | None = None, timeout: int | None = None, code: PathLike | str | None = None, identity: ManagedIdentityConfiguration | AmlTokenConfiguration | UserIdentityConfiguration | None = None, is_deterministic: bool = True, services: Dict[str, JobService | JupyterLabJobService | SshJobService | TensorBoardJobService | VsCodeJobService] | None = None, job_tier: str | None = None, priority: str | None = None, **kwargs: Any) -> Command

Keyword-Only Parameters

Name Description
name

The name of the Command job or component.

description

The description of the Command. Defaults to None.

tags

Tag dictionary. Tags can be added, removed, and updated. Defaults to None.

properties

The job property dictionary. Defaults to None.

display_name

The display name of the job. Defaults to a randomly generated name.

command

The command to be executed. Defaults to None.

experiment_name

The name of the experiment that the job will be created under. Defaults to current directory name.

environment

The environment that the job will run in.

environment_variables

A dictionary of environment variable names and values. These environment variables are set on the process where user script is being executed. Defaults to None.

distribution

The configuration for distributed jobs. Defaults to None.

compute

The compute target the job will run on. Defaults to default compute.

inputs
Optional[dict[str, Union[Input, str, bool, int, float, <xref:Enum>]]]

A mapping of input names to input data sources used in the job. Defaults to None.

outputs

A mapping of output names to output data sources used in the job. Defaults to None.

instance_count

The number of instances or nodes to be used by the compute target. Defaults to 1.

instance_type

The type of VM to be used by the compute target.

locations

The list of locations where the job will run.

docker_args

Extra arguments to pass to the Docker run command. This would override any parameters that have already been set by the system, or in this section. This parameter is only supported for Azure ML compute types. Defaults to None.

shm_size

The size of the Docker container's shared memory block. This should be in the format of (number)(unit) where the number has to be greater than 0 and the unit can be one of b(bytes), k(kilobytes), m(megabytes), or g(gigabytes).

timeout

The number, in seconds, after which the job will be cancelled.

code

The source code to run the job. Can be a local path or "http:", "https:", or "azureml:" url pointing to a remote location.

identity

The identity that the command job will use while running on compute.

is_deterministic

Specifies whether the Command will return the same output given the same input. Defaults to True. When True, if a Command Component is deterministic and has been run before in the current workspace with the same input and settings, it will reuse results from a previously submitted job when used as a node or step in a pipeline. In that scenario, no compute resources will be used.

default value: True
services

The interactive services for the node. Defaults to None. This is an experimental parameter, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.

job_tier

The job tier. Accepted values are "Spot", "Basic", "Standard", or "Premium".

priority

The priority of the job on the compute. Accepted values are "low", "medium", and "high". Defaults to "medium".

Returns

Type Description

A Command object.

Examples

Creating a Command Job using the command() builder method.


   from azure.ai.ml import Input, Output, command

   train_func = command(
       environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu:33",
       command='echo "hello world"',
       distribution={"type": "Pytorch", "process_count_per_instance": 2},
       inputs={
           "training_data": Input(type="uri_folder"),
           "max_epochs": 20,
           "learning_rate": 1.8,
           "learning_rate_schedule": "time-based",
       },
       outputs={"model_output": Output(type="uri_folder")},
   )

load_batch_deployment

Construct a batch deployment object from yaml file.

load_batch_deployment(source: str | PathLike | IO, *, relative_origin: str | None = None, **kwargs: Any) -> BatchDeployment

Parameters

Name Description
source
Required
Union[<xref:PathLike>, str, TextIOWrapper]

The local yaml source of a batch deployment object. Must be either a path to a local file, or an already-open file. If the source is a path, it will be open and read. An exception is raised if the file does not exist. If the source is an open file, the file will be read directly, and an exception is raised if the file is not readable.

Keyword-Only Parameters

Name Description
relative_origin
str

The origin to be used when deducing the relative locations of files referenced in the parsed yaml. Defaults to the inputted source's directory if it is a file or file path input. Defaults to "./" if the source is a stream input with no name value.

params_override

Fields to overwrite on top of the yaml file. Format is [{"field1": "value1"}, {"field2": "value2"}]

Returns

Type Description

Constructed batch deployment object.

load_batch_endpoint

Construct a batch endpoint object from yaml file.

load_batch_endpoint(source: str | PathLike | IO, relative_origin: str | None = None, **kwargs: Any) -> BatchEndpoint

Parameters

Name Description
source
Required
Union[<xref:PathLike>, str, TextIOWrapper]

The local yaml source of a batch endpoint object. Must be either a path to a local file, or an already-open file. If the source is a path, it will be open and read. An exception is raised if the file does not exist. If the source is an open file, the file will be read directly, and an exception is raised if the file is not readable.

relative_origin
str

The origin to be used when deducing the relative locations of files referenced in the parsed yaml. Defaults to the inputted source's directory if it is a file or file path input. Defaults to "./" if the source is a stream input with no name value.

default value: None

Keyword-Only Parameters

Name Description
params_override

Fields to overwrite on top of the yaml file. Format is [{"field1": "value1"}, {"field2": "value2"}]

Returns

Type Description

Constructed batch endpoint object.

load_component

Load component from local or remote to a component function.

load_component(source: PathLike | str | IO | None = None, *, relative_origin: str | None = None, **kwargs: Any) -> CommandComponent | ParallelComponent | PipelineComponent

Parameters

Name Description
source
Union[<xref:PathLike>, str, TextIOWrapper]

The local yaml source of a component. Must be either a path to a local file, or an already-open file. If the source is a path, it will be open and read. An exception is raised if the file does not exist. If the source is an open file, the file will be read directly, and an exception is raised if the file is not readable.

default value: None

Keyword-Only Parameters

Name Description
relative_origin
str

The origin to be used when deducing the relative locations of files referenced in the parsed yaml. Defaults to the inputted source's directory if it is a file or file path input. Defaults to "./" if the source is a stream input with no name value.

params_override

Fields to overwrite on top of the yaml file. Format is [{"field1": "value1"}, {"field2": "value2"}]

Returns

Type Description

A Component object

Examples

Loading a Component object from a YAML file, overriding its version to "1.0.2", and registering it remotely.


   from azure.ai.ml import load_component

   component = load_component(
       source="./sdk/ml/azure-ai-ml/tests/test_configs/components/helloworld_component.yml",
       params_override=[{"version": "1.0.2"}],
   )
   registered_component = ml_client.components.create_or_update(component)

load_compute

Construct a compute object from a yaml file.

load_compute(source: str | PathLike | IO, *, relative_origin: str | None = None, params_override: List[Dict[str, str]] | None = None, **kwargs: Any) -> Compute

Parameters

Name Description
source
Required
Union[<xref:PathLike>, str, TextIOWrapper]

The local yaml source of a compute. Must be either a path to a local file, or an already-open file. If the source is a path, it will be open and read. An exception is raised if the file does not exist. If the source is an open file, the file will be read directly, and an exception is raised if the file is not readable.

Keyword-Only Parameters

Name Description
relative_origin

The origin to be used when deducing the relative locations of files referenced in the parsed yaml. Defaults to the inputted source's directory if it is a file or file path input. Defaults to "./" if the source is a stream input with no name value.

params_override

Fields to overwrite on top of the yaml file. Format is [{"field1": "value1"}, {"field2": "value2"}]

Returns

Type Description

Loaded compute object.

Examples

Loading a Compute object from a YAML file and overriding its description.


   from azure.ai.ml import load_compute

   compute = load_compute(
       "../tests/test_configs/compute/compute-vm.yaml",
       params_override=[{"description": "loaded from compute-vm.yaml"}],
   )

load_data

Construct a data object from yaml file.

load_data(source: str | PathLike | IO, *, relative_origin: str | None = None, **kwargs: Any) -> Data

Parameters

Name Description
source
Required
Union[<xref:PathLike>, str, TextIOWrapper]

The local yaml source of a data object. Must be either a path to a local file, or an already-open file. If the source is a path, it will be open and read. An exception is raised if the file does not exist. If the source is an open file, the file will be read directly, and an exception is raised if the file is not readable.

Keyword-Only Parameters

Name Description
relative_origin
str

The origin to be used when deducing the relative locations of files referenced in the parsed yaml. Defaults to the inputted source's directory if it is a file or file path input. Defaults to "./" if the source is a stream input with no name value.

params_override

Fields to overwrite on top of the yaml file. Format is [{"field1": "value1"}, {"field2": "value2"}]

Returns

Type Description

Constructed Data or DataImport object.

Exceptions

Type Description

Raised if Data cannot be successfully validated. Details will be provided in the error message.

load_datastore

Construct a datastore object from a yaml file.

load_datastore(source: str | PathLike | IO, *, relative_origin: str | None = None, **kwargs: Any) -> Datastore

Parameters

Name Description
source
Required
Union[<xref:PathLike>, str, TextIOWrapper]

The local yaml source of a datastore. Must be either a path to a local file, or an already-open file. If the source is a path, it will be open and read. An exception is raised if the file does not exist. If the source is an open file, the file will be read directly, and an exception is raised if the file is not readable.

Keyword-Only Parameters

Name Description
relative_origin
str

The origin to be used when deducing the relative locations of files referenced in the parsed yaml. Defaults to the inputted source's directory if it is a file or file path input. Defaults to "./" if the source is a stream input with no name value.

params_override

Fields to overwrite on top of the yaml file. Format is [{"field1": "value1"}, {"field2": "value2"}]

Returns

Type Description

Loaded datastore object.

Exceptions

Type Description

Raised if Datastore cannot be successfully validated. Details will be provided in the error message.

load_environment

Construct a environment object from yaml file.

load_environment(source: str | PathLike | IO, *, relative_origin: str | None = None, **kwargs: Any) -> Environment

Parameters

Name Description
source
Required
Union[<xref:PathLike>, str, TextIOWrapper]

The local yaml source of an environment. Must be either a path to a local file, or an already-open file. If the source is a path, it will be open and read. An exception is raised if the file does not exist. If the source is an open file, the file will be read directly, and an exception is raised if the file is not readable.

Keyword-Only Parameters

Name Description
relative_origin
str

The origin to be used when deducing the relative locations of files referenced in the parsed yaml. Defaults to the inputted source's directory if it is a file or file path input. Defaults to "./" if the source is a stream input with no name value.

params_override

Fields to overwrite on top of the yaml file. Format is [{"field1": "value1"}, {"field2": "value2"}]

Returns

Type Description

Constructed environment object.

Exceptions

Type Description

Raised if Environment cannot be successfully validated. Details will be provided in the error message.

load_feature_set

Construct a FeatureSet object from yaml file.

load_feature_set(source: str | PathLike | IO, *, relative_origin: str | None = None, **kwargs: Any) -> FeatureSet

Parameters

Name Description
source
Required
Union[<xref:PathLike>, str, TextIOWrapper]

The local yaml source of a FeatureSet object. Must be either a path to a local file, or an already-open file. If the source is a path, it will be open and read. An exception is raised if the file does not exist. If the source is an open file, the file will be read directly, and an exception is raised if the file is not readable.

Keyword-Only Parameters

Name Description
relative_origin
str

The origin to be used when deducing the relative locations of files referenced in the parsed yaml. Defaults to the inputted source's directory if it is a file or file path input. Defaults to "./" if the source is a stream input with no name value.

params_override

Fields to overwrite on top of the yaml file. Format is [{"field1": "value1"}, {"field2": "value2"}]

Returns

Type Description

Constructed FeatureSet object.

Exceptions

Type Description

Raised if FeatureSet cannot be successfully validated. Details will be provided in the error message.

load_feature_store

Load a feature store object from a yaml file.

load_feature_store(source: str | PathLike | IO, *, relative_origin: str | None = None, **kwargs: Any) -> FeatureStore

Parameters

Name Description
source
Required
Union[<xref:PathLike>, str, TextIOWrapper]

The local yaml source of a feature store. Must be either a path to a local file, or an already-open file. If the source is a path, it will be open and read. An exception is raised if the file does not exist. If the source is an open file, the file will be read directly, and an exception is raised if the file is not readable.

Keyword-Only Parameters

Name Description
relative_origin
str

The origin to be used when deducing the relative locations of files referenced in the parsed yaml. Defaults to the inputted source's directory if it is a file or file path input. Defaults to "./" if the source is a stream input with no name value.

params_override

Fields to overwrite on top of the yaml file. Format is [{"field1": "value1"}, {"field2": "value2"}]

Returns

Type Description

Loaded feature store object.

load_feature_store_entity

Construct a FeatureStoreEntity object from yaml file.

load_feature_store_entity(source: str | PathLike | IO, *, relative_origin: str | None = None, **kwargs: Any) -> FeatureStoreEntity

Parameters

Name Description
source
Required
Union[<xref:PathLike>, str, TextIOWrapper]

The local yaml source of a FeatureStoreEntity object. Must be either a path to a local file, or an already-open file. If the source is a path, it will be open and read. An exception is raised if the file does not exist. If the source is an open file, the file will be read directly, and an exception is raised if the file is not readable.

Keyword-Only Parameters

Name Description
relative_origin
str

The origin to be used when deducing the relative locations of files referenced in the parsed yaml. Defaults to the inputted source's directory if it is a file or file path input. Defaults to "./" if the source is a stream input with no name value.

params_override

Fields to overwrite on top of the yaml file. Format is [{"field1": "value1"}, {"field2": "value2"}]

Returns

Type Description

Constructed FeatureStoreEntity object.

Exceptions

Type Description

Raised if FeatureStoreEntity cannot be successfully validated. Details will be provided in the error message.

load_job

Constructs a Job object from a YAML file.

load_job(source: str | PathLike | IO, *, relative_origin: str | None = None, **kwargs: Any) -> Job

Parameters

Name Description
source
Required
Union[<xref:PathLike>, str, TextIOWrapper]

A path to a local YAML file or an already-open file object containing a job configuration. If the source is a path, it will be opened and read. If the source is an open file, the file will be read directly.

Keyword-Only Parameters

Name Description
relative_origin

The root directory for the YAML. This directory will be used as the origin for deducing the relative locations of files referenced in the parsed YAML. Defaults to the same directory as source if source is a file or file path input. Defaults to "./" if the source is a stream input with no name value.

params_override

Parameter fields to overwrite values in the YAML file.

Returns

Type Description
Job

A loaded Job object.

Exceptions

Type Description

Raised if Job cannot be successfully validated. Details will be provided in the error message.

Examples

Loading a Job from a YAML config file.


   from azure.ai.ml import load_job

   job = load_job(source="./sdk/ml/azure-ai-ml/tests/test_configs/command_job/command_job_test_local_env.yml")

load_model

Constructs a Model object from a YAML file.

load_model(source: str | PathLike | IO, *, relative_origin: str | None = None, **kwargs: Any) -> Model

Parameters

Name Description
source
Required
Union[<xref:PathLike>, str, TextIOWrapper]

A path to a local YAML file or an already-open file object containing a job configuration. If the source is a path, it will be opened and read. If the source is an open file, the file will be read directly.

Keyword-Only Parameters

Name Description
relative_origin

The root directory for the YAML. This directory will be used as the origin for deducing the relative locations of files referenced in the parsed YAML. Defaults to the same directory as source if source is a file or file path input. Defaults to "./" if the source is a stream input with no name value.

params_override

Parameter fields to overwrite values in the YAML file.

Returns

Type Description

A loaded Model object.

Exceptions

Type Description

Raised if Job cannot be successfully validated. Details will be provided in the error message.

Examples

Loading a Model from a YAML config file, overriding the name and version parameters.


   from azure.ai.ml import load_model

   model = load_model(
       source="./sdk/ml/azure-ai-ml/tests/test_configs/model/model_with_stage.yml",
       params_override=[{"name": "new_model_name"}, {"version": "1"}],
   )

load_model_package

Note

This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.

Constructs a ModelPackage object from a YAML file.

load_model_package(source: str | PathLike | IO, *, relative_origin: str | None = None, **kwargs: Any) -> ModelPackage

Parameters

Name Description
source
Required
Union[<xref:PathLike>, str, TextIOWrapper]

A path to a local YAML file or an already-open file object containing a job configuration. If the source is a path, it will be opened and read. If the source is an open file, the file will be read directly.

Keyword-Only Parameters

Name Description
relative_origin

The root directory for the YAML. This directory will be used as the origin for deducing the relative locations of files referenced in the parsed YAML. Defaults to the same directory as source if source is a file or file path input. Defaults to "./" if the source is a stream input with no name value.

params_override

Parameter fields to overwrite values in the YAML file.

Returns

Type Description

A loaded ModelPackage object.

Exceptions

Type Description

Raised if Job cannot be successfully validated. Details will be provided in the error message.

Examples

Loading a ModelPackage from a YAML config file.


   from azure.ai.ml import load_model_package

   model_package = load_model_package(
       "./sdk/ml/azure-ai-ml/tests/test_configs/model_package/model_package_simple.yml"
   )

load_online_deployment

Construct a online deployment object from yaml file.

load_online_deployment(source: str | PathLike | IO, *, relative_origin: str | None = None, **kwargs: Any) -> OnlineDeployment

Parameters

Name Description
source
Required
Union[<xref:PathLike>, str, TextIOWrapper]

The local yaml source of an online deployment object. Must be either a path to a local file, or an already-open file. If the source is a path, it will be open and read. An exception is raised if the file does not exist. If the source is an open file, the file will be read directly, and an exception is raised if the file is not readable.

Keyword-Only Parameters

Name Description
relative_origin
str

The origin to be used when deducing the relative locations of files referenced in the parsed yaml. Defaults to the inputted source's directory if it is a file or file path input. Defaults to "./" if the source is a stream input with no name value.

params_override

Fields to overwrite on top of the yaml file. Format is [{"field1": "value1"}, {"field2": "value2"}]

Returns

Type Description

Constructed online deployment object.

Exceptions

Type Description

Raised if Online Deployment cannot be successfully validated. Details will be provided in the error message.

load_online_endpoint

Construct a online endpoint object from yaml file.

load_online_endpoint(source: str | PathLike | IO, *, relative_origin: str | None = None, **kwargs: Any) -> OnlineEndpoint

Parameters

Name Description
source
Required
Union[<xref:PathLike>, str, TextIOWrapper]

The local yaml source of an online endpoint object. Must be either a path to a local file, or an already-open file. If the source is a path, it will be open and read. An exception is raised if the file does not exist. If the source is an open file, the file will be read directly, and an exception is raised if the file is not readable.

Keyword-Only Parameters

Name Description
relative_origin
str

The origin to be used when deducing the relative locations of files referenced in the parsed yaml. Defaults to the inputted source's directory if it is a file or file path input. Defaults to "./" if the source is a stream input with no name value.

params_override

Fields to overwrite on top of the yaml file. Format is [{"field1": "value1"}, {"field2": "value2"}]

Returns

Type Description

Constructed online endpoint object.

Exceptions

Type Description

Raised if Online Endpoint cannot be successfully validated. Details will be provided in the error message.

load_registry

Load a registry object from a yaml file.

load_registry(source: str | PathLike | IO, *, relative_origin: str | None = None, **kwargs: Any) -> Registry

Parameters

Name Description
source
Required
Union[<xref:PathLike>, str, TextIOWrapper]

The local yaml source of a registry. Must be either a path to a local file, or an already-open file. If the source is a path, it will be open and read. An exception is raised if the file does not exist. If the source is an open file, the file will be read directly, and an exception is raised if the file is not readable.

Keyword-Only Parameters

Name Description
relative_origin
str

The origin to be used when deducing the relative locations of files referenced in the parsed yaml. Defaults to the inputted source's directory if it is a file or file path input. Defaults to "./" if the source is a stream input with no name value.

params_override

Fields to overwrite on top of the yaml file. Format is [{"field1": "value1"}, {"field2": "value2"}]

Returns

Type Description

Loaded registry object.

load_workspace

Load a workspace object from a yaml file.

load_workspace(source: str | PathLike | IO, *, relative_origin: str | None = None, **kwargs: Any) -> Workspace

Parameters

Name Description
source
Required
Union[<xref:PathLike>, str, TextIOWrapper]

The local yaml source of a workspace. Must be either a path to a local file, or an already-open file. If the source is a path, it will be open and read. An exception is raised if the file does not exist. If the source is an open file, the file will be read directly, and an exception is raised if the file is not readable.

Keyword-Only Parameters

Name Description
relative_origin
str

The origin to be used when deducing the relative locations of files referenced in the parsed yaml. Defaults to the inputted source's directory if it is a file or file path input. Defaults to "./" if the source is a stream input with no name value.

params_override

Fields to overwrite on top of the yaml file. Format is [{"field1": "value1"}, {"field2": "value2"}]

Returns

Type Description

Loaded workspace object.

Examples

Loading a Workspace from a YAML config file.


   from azure.ai.ml import load_workspace

   ws = load_workspace(
       "../tests/test_configs/workspace/workspace_min.yaml",
       params_override=[{"description": "loaded from workspace_min.yaml"}],
   )

load_workspace_connection

Construct a workspace connection object from yaml file.

load_workspace_connection(source: str | PathLike | IO, *, relative_origin: str | None = None, **kwargs: Any) -> WorkspaceConnection

Parameters

Name Description
source
Required
Union[<xref:PathLike>, str, TextIOWrapper]

The local yaml source of a workspace connection object. Must be either a path to a local file, or an already-open file. If the source is a path, it will be open and read. An exception is raised if the file does not exist. If the source is an open file, the file will be read directly, and an exception is raised if the file is not readable.

Keyword-Only Parameters

Name Description
relative_origin
str

The origin to be used when deducing the relative locations of files referenced in the parsed yaml. Defaults to the inputted source's directory if it is a file or file path input. Defaults to "./" if the source is a stream input with no name value.

params_override

Fields to overwrite on top of the yaml file. Format is [{"field1": "value1"}, {"field2": "value2"}]

Returns

Type Description

Constructed workspace connection object.

Examples

Loading a Workspace Connection from a YAML config file.


   from azure.ai.ml import load_workspace_connection

   wps_connection = load_workspace_connection(
       source="../tests/test_configs/workspace_connection/snowflake_user_pwd.yaml"
   )

load_workspace_hub

Note

This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.

Load a WorkspaceHub object from a yaml file.

load_workspace_hub(source: str | PathLike | IO, *, relative_origin: str | None = None, **kwargs: Any) -> WorkspaceHub

Parameters

Name Description
source
Required
Union[<xref:PathLike>, str, TextIOWrapper]

The local yaml source of a WorkspaceHub. Must be either a path to a local file, or an already-open file. If the source is a path, it will be open and read. An exception is raised if the file does not exist. If the source is an open file, the file will be read directly, and an exception is raised if the file is not readable.

Keyword-Only Parameters

Name Description
relative_origin
str

The origin to be used when deducing the relative locations of files referenced in the parsed yaml. Defaults to the inputted source's directory if it is a file or file path input. Defaults to "./" if the source is a stream input with no name value.

params_override

Fields to overwrite on top of the yaml file. Format is [{"field1": "value1"}, {"field2": "value2"}]

Returns

Type Description

Loaded WorkspaceHub object.

Examples

Loading a Workspace Hub from a YAML config file.


   from azure.ai.ml import load_workspace_hub

   hub = load_workspace_hub(
       "../tests/test_configs/workspace/workspacehub_min.yaml",
       params_override=[{"description": "loaded from workspacehub_min.yaml"}],
   )

spark

Creates a Spark object which can be used inside a dsl.pipeline function or used as a standalone Spark job.

spark(*, experiment_name: str | None = None, name: str | None = None, display_name: str | None = None, description: str | None = None, tags: Dict | None = None, code: PathLike | str | None = None, entry: Dict[str, str] | SparkJobEntry | None = None, py_files: List[str] | None = None, jars: List[str] | None = None, files: List[str] | None = None, archives: List[str] | None = None, identity: Dict[str, str] | ManagedIdentity | AmlToken | UserIdentity | None = None, driver_cores: int | None = None, driver_memory: str | None = None, executor_cores: int | None = None, executor_memory: str | None = None, executor_instances: int | None = None, dynamic_allocation_enabled: bool | None = None, dynamic_allocation_min_executors: int | None = None, dynamic_allocation_max_executors: int | None = None, conf: Dict[str, str] | None = None, environment: str | Environment | None = None, inputs: Dict | None = None, outputs: Dict | None = None, args: str | None = None, compute: str | None = None, resources: Dict | SparkResourceConfiguration | None = None, **kwargs: Any) -> Spark

Keyword-Only Parameters

Name Description
experiment_name

The name of the experiment the job will be created under.

name

The name of the job.

display_name

The job display name.

description

The description of the job. Defaults to None.

tags

The dictionary of tags for the job. Tags can be added, removed, and updated. Defaults to None.

code

The source code to run the job. Can be a local path or "http:", "https:", or "azureml:" url pointing to a remote location.

entry

The file or class entry point.

py_files

The list of .zip, .egg or .py files to place on the PYTHONPATH for Python apps. Defaults to None.

jars

The list of .JAR files to include on the driver and executor classpaths. Defaults to None.

files

The list of files to be placed in the working directory of each executor. Defaults to None.

archives

The list of archives to be extracted into the working directory of each executor. Defaults to None.

identity

The identity that the Spark job will use while running on compute.

driver_cores

The number of cores to use for the driver process, only in cluster mode.

driver_memory

The amount of memory to use for the driver process, formatted as strings with a size unit suffix ("k", "m", "g" or "t") (e.g. "512m", "2g").

executor_cores

The number of cores to use on each executor.

executor_memory

The amount of memory to use per executor process, formatted as strings with a size unit suffix ("k", "m", "g" or "t") (e.g. "512m", "2g").

executor_instances

The initial number of executors.

dynamic_allocation_enabled

Whether to use dynamic resource allocation, which scales the number of executors registered with this application up and down based on the workload.

dynamic_allocation_min_executors

The lower bound for the number of executors if dynamic allocation is enabled.

dynamic_allocation_max_executors

The upper bound for the number of executors if dynamic allocation is enabled.

conf

A dictionary with pre-defined Spark configurations key and values. Defaults to None.

environment

The Azure ML environment to run the job in.

inputs

A mapping of input names to input data used in the job. Defaults to None.

outputs

A mapping of output names to output data used in the job. Defaults to None.

args

The arguments for the job.

compute

The compute resource the job runs on.

resources

The compute resource configuration for the job.

Returns

Type Description

A Spark object.

Examples

Building a Spark pipeline using the DSL pipeline decorator


   from azure.ai.ml import Input, Output, dsl, spark
   from azure.ai.ml.constants import AssetTypes, InputOutputModes

   # define the spark task
   first_step = spark(
       code="/src",
       entry={"file": "add_greeting_column.py"},
       py_files=["utils.zip"],
       files=["my_files.txt"],
       driver_cores=2,
       driver_memory="1g",
       executor_cores=1,
       executor_memory="1g",
       executor_instances=1,
       inputs=dict(
           file_input=Input(path="/dataset/iris.csv", type=AssetTypes.URI_FILE, mode=InputOutputModes.DIRECT)
       ),
       args="--file_input ${{inputs.file_input}}",
       resources={"instance_type": "standard_e4s_v3", "runtime_version": "3.2.0"},
   )

   second_step = spark(
       code="/src",
       entry={"file": "count_by_row.py"},
       jars=["scala_project.jar"],
       files=["my_files.txt"],
       driver_cores=2,
       driver_memory="1g",
       executor_cores=1,
       executor_memory="1g",
       executor_instances=1,
       inputs=dict(
           file_input=Input(path="/dataset/iris.csv", type=AssetTypes.URI_FILE, mode=InputOutputModes.DIRECT)
       ),
       outputs=dict(output=Output(type="uri_folder", mode=InputOutputModes.DIRECT)),
       args="--file_input ${{inputs.file_input}} --output ${{outputs.output}}",
       resources={"instance_type": "standard_e4s_v3", "runtime_version": "3.2.0"},
   )

   # Define pipeline
   @dsl.pipeline(description="submit a pipeline with spark job")
   def spark_pipeline_from_builder(data):
       add_greeting_column = first_step(file_input=data)
       count_by_row = second_step(file_input=data)
       return {"output": count_by_row.outputs.output}

   pipeline = spark_pipeline_from_builder(
       data=Input(path="/dataset/iris.csv", type=AssetTypes.URI_FILE, mode=InputOutputModes.DIRECT),
   )