Run Class
Defines the base class for all Azure Machine Learning experiment runs.
A run represents a single trial of an experiment. Runs are used to monitor the asynchronous execution of a trial, log metrics and store output of the trial, and to analyze results and access artifacts generated by the trial.
Run objects are created when you submit a script to train a model in many different scenarios in Azure Machine Learning, including HyperDrive runs, Pipeline runs, and AutoML runs. A Run object is also created when you submit or start_logging with the Experiment class.
To get started with experiments and runs, see
Initialize the Run object.
- Inheritance
-
azureml._run_impl.run_base._RunBaseRun
Constructor
Run(experiment, run_id, outputs=None, **kwargs)
Parameters
Name | Description |
---|---|
experiment
Required
|
The containing experiment. |
run_id
Required
|
The ID for the run. |
outputs
|
The outputs to be tracked. Default value: None
|
_run_dto
Required
|
<xref:azureml._restclient.models.run_dto.RunDto>
Internal use only. |
kwargs
Required
|
A dictionary of additional configuration parameters. |
experiment
Required
|
The containing experiment. |
run_id
Required
|
The ID for the run. |
outputs
Required
|
The outputs to be tracked. |
kwargs
Required
|
A dictionary of additional configuration parameters. |
Remarks
A run represents a single trial of an experiment. A Run object is used to monitor the asynchronous execution of a trial, log metrics and store output of the trial, and to analyze results and access artifacts generated by the trial.
Run is used inside of your experimentation code to log metrics and artifacts to the Run History service.
Run is used outside of your experiments to monitor progress and to query and analyze the metrics and results that were generated.
The functionality of Run includes:
Storing and retrieving metrics and data
Uploading and downloading files
Using tags as well as the child hierarchy for easy lookup of past runs
Registering stored model files as a model that can be operationalized
Storing, modifying, and retrieving properties of a run
Loading the current run from a remote environment with the get_context method
Efficiently snapshotting a file or directory for reproducibility
This class works with the Experiment in these scenarios:
Creating a run by executing code using submit
Creating a run interactively in a notebook using start_logging
Logging metrics and uploading artifacts in your experiment, such as when using log
Reading metrics and downloading artifacts when analyzing experimental results, such as when using get_metrics
To submit a run, create a configuration object that describes how the experiment is run. Here are examples of the different configuration objects you can use:
azureml.train.automl.automlconfig.AutoMLConfig
azureml.train.hyperdrive.HyperDriveConfig
azureml.pipeline.core.Pipeline
azureml.pipeline.core.PublishedPipeline
azureml.pipeline.core.PipelineEndpoint
The following metrics can be added to a run while training an experiment.
Scalar
Log a numerical or string value to the run with the given name using log. Logging a metric to a run causes that metric to be stored in the run record in the experiment. You can log the same metric multiple times within a run, the result being considered a vector of that metric.
Example:
run.log("accuracy", 0.95)
List
Log a list of values to the run with the given name using log_list.
Example:
run.log_list("accuracies", [0.6, 0.7, 0.87])
Row
Using log_row creates a metric with multiple columns as described in
kwargs
. Each named parameter generates a column with the value specified.log_row
can be called once to log an arbitrary tuple, or multiple times in a loop to generate a complete table.Example:
run.log_row("Y over X", x=1, y=0.4)
Table
Log a dictionary object to the run with the given name using log_table.
Example:
run.log_table("Y over X", {"x":[1, 2, 3], "y":[0.6, 0.7, 0.89]})
Image
Log an image to the run record. Use log_image to log an image file or a matplotlib plot to the run. These images will be visible and comparable in the run record.
Example:
run.log_image("ROC", path)
Methods
add_properties |
Add immutable properties to the run. Tags and properties (both dict[str, str]) differ in their mutability. Properties are immutable, so they create a permanent record for auditing purposes. Tags are mutable. For more information about working with tags and properties, see Tag and find runs. |
add_type_provider |
Extensibility hook for custom Run types stored in Run History. |
cancel |
Mark the run as canceled. If there is an associated job with a set cancel_uri field, terminate that job as well. |
child_run |
Create a child run. |
clean |
Remove the files corresponding to the current run on the target specified in the run configuration. |
complete |
Wait for task queue to be processed. Then run is marked as completed. This is typically used in interactive notebook scenarios. |
create_children |
Create one or many child runs. |
download_file |
Download an associated file from storage. |
download_files |
Download files from a given storage prefix (folder name) or the entire container if prefix is unspecified. |
fail |
Mark the run as failed. Optionally set the Error property of the run with a message or exception passed to |
flush |
Wait for task queue to be processed. |
get |
Get the run for this workspace with its run ID. |
get_all_logs |
Download all logs for the run to a directory. |
get_children |
Get all children for the current run selected by specified filters. |
get_context |
Return current service context. Use this method to retrieve the current service context for logging metrics and uploading files. If
|
get_detailed_status |
Fetch the latest status of the run. If the status of the run is "Queued", it will show the details. |
get_details |
Get the definition, status information, current log files, and other details of the run. |
get_details_with_logs |
Return run status including log file content. |
get_environment |
Get the environment definition that was used by this run. |
get_file_names |
List the files that are stored in association with the run. |
get_metrics |
Retrieve the metrics logged to the run. If |
get_properties |
Fetch the latest properties of the run from the service. |
get_secret |
Get the secret value from the context of a run. Get the secret value for the name provided. The secret name references a value stored in Azure Key Vault associated with your workspace. For an example of working with secrets, see Use secrets in training runs. |
get_secrets |
Get the secret values for a given list of secret names. Get a dictionary of found and not found secrets for the list of names provided. Each secret name references a value stored in Azure Key Vault associated with your workspace. For an example of working with secrets, see Use secrets in training runs. |
get_snapshot_id |
Get the latest snapshot ID. |
get_status |
Fetch the latest status of the run. Common values returned include "Running", "Completed", and "Failed". |
get_submitted_run |
DEPRECATED. Use get_context. Get the submitted run for this experiment. |
get_tags |
Fetch the latest set of mutable tags on the run from the service. |
list |
Get a list of runs in an experiment specified by optional filters. |
list_by_compute |
Get a list of runs in a compute specified by optional filters. |
log |
Log a metric value to the run with the given name. |
log_accuracy_table |
Log an accuracy table to the artifact store. The accuracy table metric is a multi-use, non-scalar metric that can be used to produce multiple types of line charts that vary continuously over the space of predicted probabilities. Examples of these charts are ROC, precision-recall, and lift curves. The calculation of the accuracy table is similar to the calculation of an ROC curve. An ROC curve stores true positive rates and false positive rates at many different probability thresholds. The accuracy table stores the raw number of true positives, false positives, true negatives, and false negatives at many probability thresholds. There are two methods used for selecting thresholds: "probability" and "percentile." They differ in how they sample from the space of predicted probabilities. Probability thresholds are uniformly spaced thresholds between 0 and 1. If NUM_POINTS is 5 the probability thresholds would be [0.0, 0.25, 0.5, 0.75, 1.0]. Percentile thresholds are spaced according to the distribution of predicted probabilities. Each threshold corresponds to the percentile of the data at a probability threshold. For example, if NUM_POINTS is 5, then the first threshold would be at the 0th percentile, the second at the 25th percentile, the third at the 50th, and so on. The probability tables and percentile tables are both 3D lists where the first dimension represents the class label, the second dimension represents the sample at one threshold (scales with NUM_POINTS), and the third dimension always has 4 values: TP, FP, TN, FN, and always in that order. The confusion values (TP, FP, TN, FN) are computed with the one vs. rest strategy. See the following link for more details: https://en.wikipedia.org/wiki/Multiclass_classification N = # of samples in validation dataset (200 in example) M = # thresholds = # samples taken from the probability space (5 in example) C = # classes in full dataset (3 in example) Some invariants of the accuracy table:
Note: M can be any value and controls the resolution of the charts This is independent of the dataset, is defined when calculating metrics, and trades off storage space, computation time, and resolution. Class labels should be strings, confusion values should be integers, and thresholds should be floats. |
log_confusion_matrix |
Log a confusion matrix to the artifact store. This logs a wrapper around the sklearn confusion matrix. The metric data contains the class labels and a 2D list for the matrix itself. See the following link for more details on how the metric is computed: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html |
log_image |
Log an image metric to the run record. |
log_list |
Log a list of metric values to the run with the given name. |
log_predictions |
Log predictions to the artifact store. This logs a metric score that can be used to compare the distributions of true target values to the distribution of predicted values for a regression task. The predictions are binned and standard deviations are calculated for error bars on a line chart. |
log_residuals |
Log residuals to the artifact store. This logs the data needed to display a histogram of residuals for a regression task. The residuals are predicted - actual. There should be one more edge than the number of counts. Please see the numpy histogram documentation for examples of using counts and edges to represent a histogram. https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html |
log_row |
Log a row metric to the run with the given name. |
log_table |
Log a table metric to the run with the given name. |
register_model |
Register a model for operationalization. |
remove_tags |
Delete the list of mutable tags on this run. |
restore_snapshot |
Restore a snapshot as a ZIP file. Returns the path to the ZIP. |
set_tags |
Add or modify a set of tags on the run. Tags not passed in the dictionary are left untouched. You can also add simple string tags. When these tags appear in the tag dictionary as keys, they have a value of None. For more information, see Tag and find runs. |
start |
Mark the run as started. This is typically used in advanced scenarios when the run has been created by another actor. |
submit_child |
Submit an experiment and return the active child run. |
tag |
Tag the run with a string key and optional string value. |
take_snapshot |
Save a snapshot of the input file or folder. |
upload_file |
Upload a file to the run record. |
upload_files |
Upload files to the run record. |
upload_folder |
Upload the specified folder to the given prefix name. |
wait_for_completion |
Wait for the completion of this run. Returns the status object after the wait. |
add_properties
Add immutable properties to the run.
Tags and properties (both dict[str, str]) differ in their mutability. Properties are immutable, so they create a permanent record for auditing purposes. Tags are mutable. For more information about working with tags and properties, see Tag and find runs.
add_properties(properties)
Parameters
Name | Description |
---|---|
properties
Required
|
The hidden properties stored in the run object. |
add_type_provider
Extensibility hook for custom Run types stored in Run History.
static add_type_provider(runtype, run_factory)
Parameters
Name | Description |
---|---|
runtype
Required
|
The value of Run.type for which the factory will be invoked. Examples include 'hyperdrive' or 'azureml.scriptrun', but can be extended with custom types. |
run_factory
Required
|
<xref:function>
A function with signature (Experiment, RunDto) -> Run to be invoked when listing runs. |
cancel
Mark the run as canceled.
If there is an associated job with a set cancel_uri field, terminate that job as well.
cancel()
child_run
Create a child run.
child_run(name=None, run_id=None, outputs=None)
Parameters
Name | Description |
---|---|
name
|
An optional name for the child run, typically specified for a "part". Default value: None
|
run_id
|
An optional run ID for the child, otherwise it is auto-generated. Typically this parameter is not set. Default value: None
|
outputs
|
Optional outputs directory to track for the child. Default value: None
|
Returns
Type | Description |
---|---|
The child run. |
Remarks
This is used to isolate part of a run into a subsection. This can be done for identifiable "parts" of a run that are interesting to separate, or to capture independent metrics across an interation of a subprocess.
If an output directory is set for the child run, the contents of that directory will be uploaded to the child run record when the child is completed.
clean
Remove the files corresponding to the current run on the target specified in the run configuration.
clean()
Returns
Type | Description |
---|---|
A list of files deleted. |
complete
Wait for task queue to be processed.
Then run is marked as completed. This is typically used in interactive notebook scenarios.
complete(_set_status=True)
Parameters
Name | Description |
---|---|
_set_status
|
Indicates whether to send the status event for tracking. Default value: True
|
create_children
Create one or many child runs.
create_children(count=None, tag_key=None, tag_values=None)
Parameters
Name | Description |
---|---|
count
|
An optional number of children to create. Default value: None
|
tag_key
|
An optional key to populate the Tags entry in all created children. Default value: None
|
tag_Values
Required
|
An optional list of values that will map onto Tags[tag_key] for the list of runs created. |
tag_values
|
Default value: None
|
Returns
Type | Description |
---|---|
The list of child runs. |
Remarks
Either parameter count
OR parameters tag_key
AND tag_values
must be specified.
download_file
Download an associated file from storage.
download_file(name, output_file_path=None, _validate_checksum=False)
Parameters
Name | Description |
---|---|
name
Required
|
The name of the artifact to be downloaded. |
output_file_path
Required
|
The local path where to store the artifact. |
download_files
Download files from a given storage prefix (folder name) or the entire container if prefix is unspecified.
download_files(prefix=None, output_directory=None, output_paths=None, batch_size=100, append_prefix=True, timeout_seconds=None)
Parameters
Name | Description |
---|---|
prefix
Required
|
The filepath prefix within the container from which to download all artifacts. |
output_directory
Required
|
An optional directory that all artifact paths use as a prefix. |
output_paths
Required
|
[str]
Optional filepaths in which to store the downloaded artifacts. Should be unique and match length of paths. |
batch_size
Required
|
The number of files to download per batch. The default is 100 files. |
append_prefix
Required
|
An optional flag whether to append the specified prefix from the final output file path. If False then the prefix is removed from the output file path. |
timeout_seconds
Required
|
The timeout for downloading files. |
fail
Mark the run as failed.
Optionally set the Error property of the run with a message or exception passed to error_details
.
fail(error_details=None, error_code=None, _set_status=True)
Parameters
Name | Description |
---|---|
error_details
|
str or
BaseException
Optional details of the error. Default value: None
|
error_code
|
Optional error code of the error for the error classification. Default value: None
|
_set_status
|
Indicates whether to send the status event for tracking. Default value: True
|
flush
Wait for task queue to be processed.
flush(timeout_seconds=300)
Parameters
Name | Description |
---|---|
timeout_seconds
|
How long to wait (in seconds) for task queue to be processed. Default value: 300
|
get
Get the run for this workspace with its run ID.
static get(workspace, run_id)
Parameters
Name | Description |
---|---|
workspace
Required
|
The containing workspace. |
run_id
Required
|
The run ID. |
Returns
Type | Description |
---|---|
The submitted run. |
get_all_logs
Download all logs for the run to a directory.
get_all_logs(destination=None)
Parameters
Name | Description |
---|---|
destination
|
The destination path to store logs. If unspecified, a directory named as the run ID is created in the project directory. Default value: None
|
Returns
Type | Description |
---|---|
A list of names of logs downloaded. |
get_children
Get all children for the current run selected by specified filters.
get_children(recursive=False, tags=None, properties=None, type=None, status=None, _rehydrate_runs=True)
Parameters
Name | Description |
---|---|
recursive
|
Indicates whether to recurse through all descendants. Default value: False
|
tags
|
If specified, returns runs matching specified "tag" or {"tag": "value"}. Default value: None
|
properties
|
If specified, returns runs matching specified "property" or {"property": "value"}. Default value: None
|
type
|
If specified, returns runs matching this type. Default value: None
|
status
|
If specified, returns runs with status specified "status". Default value: None
|
_rehydrate_runs
|
Indicates whether to instantiate a run of the original type or the base Run. Default value: True
|
Returns
Type | Description |
---|---|
A list of Run objects. |
get_context
Return current service context.
Use this method to retrieve the current service context for logging metrics and uploading files. If
allow_offline
is True (the default), actions against the Run object will be printed to standard
out.
get_context(allow_offline=True, used_for_context_manager=False, **kwargs)
Parameters
Name | Description |
---|---|
cls
Required
|
Indicates class method. |
allow_offline
|
Allow the service context to fall back to offline mode so that the training script can be tested locally without submitting a job with the SDK. True by default. Default value: True
|
kwargs
Required
|
A dictionary of additional parameters. |
used_for_context_manager
|
Default value: False
|
Returns
Type | Description |
---|---|
The submitted run. |
Remarks
This function is commonly used to retrieve the authenticated Run object inside of a script to be submitted for execution via experiment.submit(). This run object is both an authenticated context to communicate with Azure Machine Learning services and a conceptual container within which metrics, files (artifacts), and models are contained.
run = Run.get_context() # allow_offline=True by default, so can be run locally as well
...
run.log("Accuracy", 0.98)
run.log_row("Performance", epoch=e, error=err)
get_detailed_status
Fetch the latest status of the run. If the status of the run is "Queued", it will show the details.
get_detailed_status()
Returns
Type | Description |
---|---|
The latest status and details |
Remarks
status: The run's current status. Same value as that returned from get_status().
details: The detailed information for current status.
run = experiment.submit(config)
details = run.get_detailed_status()
# details = {
# 'status': 'Queued',
# 'details': 'Run requested 1 node(s). Run is in pending status.',
# }
get_details
Get the definition, status information, current log files, and other details of the run.
get_details()
Returns
Type | Description |
---|---|
Return the details for the run |
Remarks
The returned dictionary contains the following key-value pairs:
runId: ID of this run.
target
status: The run's current status. Same value as that returned from get_status().
startTimeUtc: UTC time of when this run was started, in ISO8601.
endTimeUtc: UTC time of when this run was finished (either Completed or Failed), in ISO8601.
This key does not exist if the run is still in progress.
properties: Immutable key-value pairs associated with the run. Default properties include the run's snapshot ID and information about the git repository from which the run was created (if any). Additional properties can be added to a run using add_properties.
inputDatasets: Input datasets associated with the run.
outputDatasets: Output datasets associated with the run.
logFiles
submittedBy
run = experiment.start_logging()
details = run.get_details()
# details = {
# 'runId': '5c24aa28-6e4a-4572-96a0-fb522d26fe2d',
# 'target': 'sdk',
# 'status': 'Running',
# 'startTimeUtc': '2019-01-01T13:08:01.713777Z',
# 'endTimeUtc': '2019-01-01T17:15:65.986253Z',
# 'properties': {
# 'azureml.git.repository_uri': 'https://example.com/my/git/repo',
# 'azureml.git.branch': 'master',
# 'azureml.git.commit': '7dc972657c2168927a02c3bc2b161e0f370365d7',
# 'azureml.git.dirty': 'True',
# 'mlflow.source.git.repoURL': 'https://example.com/my/git/repo',
# 'mlflow.source.git.branch': 'master',
# 'mlflow.source.git.commit': '7dc972657c2168927a02c3bc2b161e0f370365d7',
# 'ContentSnapshotId': 'b4689489-ce2f-4db5-b6d7-6ad11e77079c'
# },
# 'inputDatasets': [{
# 'dataset': {'id': 'cdebf245-701d-4a68-8055-41f9cf44f298'},
# 'consumptionDetails': {
# 'type': 'RunInput',
# 'inputName': 'training-data',
# 'mechanism': 'Mount',
# 'pathOnCompute': '/mnt/datasets/train'
# }
# }],
# 'outputDatasets': [{
# 'dataset': {'id': 'd04e8a19-1caa-4b1f-b318-4cbff9af9615'},
# 'outputType': 'RunOutput',
# 'outputDetails': {
# 'outputName': 'training-result'
# }
# }],
# 'runDefinition': {},
# 'logFiles': {},
# 'submittedBy': 'Alan Turing'
# }
get_details_with_logs
Return run status including log file content.
get_details_with_logs()
Returns
Type | Description |
---|---|
Returns the status for the run with log file contents. |
get_environment
Get the environment definition that was used by this run.
get_environment()
Returns
Type | Description |
---|---|
Return the environment object. |
get_file_names
List the files that are stored in association with the run.
get_file_names()
Returns
Type | Description |
---|---|
The list of paths for existing artifacts |
get_metrics
Retrieve the metrics logged to the run.
If recursive
is True (False by default), then fetch metrics for runs in the given run's subtree.
get_metrics(name=None, recursive=False, run_type=None, populate=False)
Parameters
Name | Description |
---|---|
name
|
The name of the metric. Default value: None
|
recursive
|
Indicates whether to recurse through all descendants. Default value: False
|
run_type
|
Default value: None
|
populate
|
Indicates whether to fetch the contents of external data linked to the metric. Default value: False
|
Returns
Type | Description |
---|---|
A dictionary containing the users metrics. |
Remarks
run = experiment.start_logging() # run id: 123
run.log("A", 1)
with run.child_run() as child: # run id: 456
child.log("A", 2)
metrics = run.get_metrics()
# metrics = { 'A': 1 }
metrics = run.get_metrics(recursive=True)
# metrics = { '123': { 'A': 1 }, '456': { 'A': 2 } } note key is runId
get_properties
Fetch the latest properties of the run from the service.
get_properties()
Returns
Type | Description |
---|---|
The properties of the run. |
Remarks
Properties are immutable system-generated information such as duration, date of execution, user, and custom properties added with the add_properties method. For more information, see Tag and find runs.
When submitting a job to Azure Machine Learning, if source files are stored in a local git repository then information about the repo is stored as properties. These git properties are added when creating a run or calling Experiment.submit. For more information about git properties see Git integration for Azure Machine Learning.
get_secret
Get the secret value from the context of a run.
Get the secret value for the name provided. The secret name references a value stored in Azure Key Vault associated with your workspace. For an example of working with secrets, see Use secrets in training runs.
get_secret(name)
Parameters
Name | Description |
---|---|
name
Required
|
The secret name for which to return a secret. |
Returns
Type | Description |
---|---|
The secret value. |
get_secrets
Get the secret values for a given list of secret names.
Get a dictionary of found and not found secrets for the list of names provided. Each secret name references a value stored in Azure Key Vault associated with your workspace. For an example of working with secrets, see Use secrets in training runs.
get_secrets(secrets)
Parameters
Name | Description |
---|---|
secrets
Required
|
A list of secret names for which to return secret values. |
Returns
Type | Description |
---|---|
Returns a dictionary of found and not found secrets. |
get_snapshot_id
Get the latest snapshot ID.
get_snapshot_id()
Returns
Type | Description |
---|---|
The most recent snapshot ID. |
get_status
Fetch the latest status of the run.
Common values returned include "Running", "Completed", and "Failed".
get_status()
Returns
Type | Description |
---|---|
The latest status. |
Remarks
NotStarted - This is a temporary state client-side Run objects are in before cloud submission.
Starting - The Run has started being processed in the cloud. The caller has a run ID at this point.
Provisioning - Returned when on-demand compute is being created for a given job submission.
Preparing - The run environment is being prepared:
docker image build
conda environment setup
Queued - The job is queued in the compute target. For example, in BatchAI the job is in queued state
while waiting for all the requested nodes to be ready.
Running - The job started to run in the compute target.
Finalizing - User code has completed and the run is in post-processing stages.
CancelRequested - Cancellation has been requested for the job.
Completed - The run completed successfully. This includes both the user code and run
post-processing stages.
Failed - The run failed. Usually the Error property on a run will provide details as to why.
Canceled - Follows a cancellation request and indicates that the run is now successfully cancelled.
NotResponding - For runs that have Heartbeats enabled, no heartbeat has been recently sent.
run = experiment.submit(config)
while run.get_status() not in ['Completed', 'Failed']: # For example purposes only, not exhaustive
print('Run {} not in terminal state'.format(run.id))
time.sleep(10)
get_submitted_run
DEPRECATED. Use get_context.
Get the submitted run for this experiment.
get_submitted_run(**kwargs)
Returns
Type | Description |
---|---|
The submitted run. |
get_tags
Fetch the latest set of mutable tags on the run from the service.
get_tags()
Returns
Type | Description |
---|---|
The tags stored on the run object. |
list
Get a list of runs in an experiment specified by optional filters.
static list(experiment, type=None, tags=None, properties=None, status=None, include_children=False, _rehydrate_runs=True)
Parameters
Name | Description |
---|---|
experiment
Required
|
The containing experiment. |
type
|
If specified, returns runs matching specified type. Default value: None
|
tags
|
If specified, returns runs matching specified "tag" or {"tag": "value"}. Default value: None
|
properties
|
If specified, returns runs matching specified "property" or {"property": "value"}. Default value: None
|
status
|
If specified, returns runs with status specified "status". Default value: None
|
include_children
|
If set to true, fetch all the runs, not only top-level ones. Default value: False
|
_rehydrate_runs
|
If set to True (by default), will use the registered provider to reinstantiate an object for that type instead of the base Run. Default value: True
|
Returns
Type | Description |
---|---|
A list of runs. |
Remarks
The following code example shows some uses of the list
method.
favorite_completed_runs = Run.list(experiment, status='Completed', tags='favorite')
all_distinct_runs = Run.list(experiment)
and_their_children = Run.list(experiment, include_children=True)
only_script_runs = Run.list(experiment, type=ScriptRun.RUN_TYPE)
list_by_compute
Get a list of runs in a compute specified by optional filters.
static list_by_compute(compute, type=None, tags=None, properties=None, status=None)
Parameters
Name | Description |
---|---|
compute
Required
|
The containing compute. |
type
|
If specified, returns runs matching specified type. Default value: None
|
tags
|
If specified, returns runs matching specified "tag" or {"tag": "value"}. Default value: None
|
properties
|
If specified, returns runs matching specified "property" or {"property": "value"}. Default value: None
|
status
|
If specified, returns runs with status specified "status". Only allowed values are "Running" and "Queued". Default value: None
|
Returns
Type | Description |
---|---|
<xref:builtin.generator>
|
a generator of ~_restclient.models.RunDto |
log
Log a metric value to the run with the given name.
log(name, value, description='', step=None)
Parameters
Name | Description |
---|---|
name
Required
|
The name of metric. |
value
Required
|
The value to be posted to the service. |
description
Required
|
An optional metric description. |
step
|
An optional axis to specify value order within a metric. Default value: None
|
Remarks
Logging a metric to a run causes that metric to be stored in the run record in the experiment. You can log the same metric multiple times within a run, the result being considered a vector of that metric. If step is specified for a metric it must be specified for all values.
log_accuracy_table
Log an accuracy table to the artifact store.
The accuracy table metric is a multi-use, non-scalar metric that can be used to produce multiple types of line charts that vary continuously over the space of predicted probabilities. Examples of these charts are ROC, precision-recall, and lift curves.
The calculation of the accuracy table is similar to the calculation of an ROC curve. An ROC curve stores true positive rates and false positive rates at many different probability thresholds. The accuracy table stores the raw number of true positives, false positives, true negatives, and false negatives at many probability thresholds.
There are two methods used for selecting thresholds: "probability" and "percentile." They differ in how they sample from the space of predicted probabilities.
Probability thresholds are uniformly spaced thresholds between 0 and 1. If NUM_POINTS is 5 the probability thresholds would be [0.0, 0.25, 0.5, 0.75, 1.0].
Percentile thresholds are spaced according to the distribution of predicted probabilities. Each threshold corresponds to the percentile of the data at a probability threshold. For example, if NUM_POINTS is 5, then the first threshold would be at the 0th percentile, the second at the 25th percentile, the third at the 50th, and so on.
The probability tables and percentile tables are both 3D lists where the first dimension represents the class label, the second dimension represents the sample at one threshold (scales with NUM_POINTS), and the third dimension always has 4 values: TP, FP, TN, FN, and always in that order.
The confusion values (TP, FP, TN, FN) are computed with the one vs. rest strategy. See the following link for more details: https://en.wikipedia.org/wiki/Multiclass_classification
N = # of samples in validation dataset (200 in example) M = # thresholds = # samples taken from the probability space (5 in example) C = # classes in full dataset (3 in example)
Some invariants of the accuracy table:
- TP + FP + TN + FN = N for all thresholds for all classes
- TP + FN is the same at all thresholds for any class
- TN + FP is the same at all thresholds for any class
- Probability tables and percentile tables have shape [C, M, 4]
Note: M can be any value and controls the resolution of the charts This is independent of the dataset, is defined when calculating metrics, and trades off storage space, computation time, and resolution.
Class labels should be strings, confusion values should be integers, and thresholds should be floats.
log_accuracy_table(name, value, description='')
Parameters
Name | Description |
---|---|
name
Required
|
The name of the accuracy table. |
value
Required
|
JSON containing name, version, and data properties. |
description
Required
|
An optional metric description. |
Remarks
Example of a valid JSON value:
{
"schema_type": "accuracy_table",
"schema_version": "1.0.1",
"data": {
"probability_tables": [
[
[82, 118, 0, 0],
[75, 31, 87, 7],
[66, 9, 109, 16],
[46, 2, 116, 36],
[0, 0, 118, 82]
],
[
[60, 140, 0, 0],
[56, 20, 120, 4],
[47, 4, 136, 13],
[28, 0, 140, 32],
[0, 0, 140, 60]
],
[
[58, 142, 0, 0],
[53, 29, 113, 5],
[40, 10, 132, 18],
[24, 1, 141, 34],
[0, 0, 142, 58]
]
],
"percentile_tables": [
[
[82, 118, 0, 0],
[82, 67, 51, 0],
[75, 26, 92, 7],
[48, 3, 115, 34],
[3, 0, 118, 79]
],
[
[60, 140, 0, 0],
[60, 89, 51, 0],
[60, 41, 99, 0],
[46, 5, 135, 14],
[3, 0, 140, 57]
],
[
[58, 142, 0, 0],
[56, 93, 49, 2],
[54, 47, 95, 4],
[41, 10, 132, 17],
[3, 0, 142, 55]
]
],
"probability_thresholds": [0.0, 0.25, 0.5, 0.75, 1.0],
"percentile_thresholds": [0.0, 0.01, 0.24, 0.98, 1.0],
"class_labels": ["0", "1", "2"]
}
}
log_confusion_matrix
Log a confusion matrix to the artifact store.
This logs a wrapper around the sklearn confusion matrix. The metric data contains the class labels and a 2D list for the matrix itself. See the following link for more details on how the metric is computed: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
log_confusion_matrix(name, value, description='')
Parameters
Name | Description |
---|---|
name
Required
|
The name of the confusion matrix. |
value
Required
|
JSON containing name, version, and data properties. |
description
Required
|
An optional metric description. |
Remarks
Example of a valid JSON value:
{
"schema_type": "confusion_matrix",
"schema_version": "1.0.0",
"data": {
"class_labels": ["0", "1", "2", "3"],
"matrix": [
[3, 0, 1, 0],
[0, 1, 0, 1],
[0, 0, 1, 0],
[0, 0, 0, 1]
]
}
}
log_image
Log an image metric to the run record.
log_image(name, path=None, plot=None, description='')
Parameters
Name | Description |
---|---|
name
Required
|
The name of the metric. |
path
Required
|
The path or stream of the image. |
plot
Required
|
<xref:matplotlib.pyplot>
The plot to log as an image. |
description
Required
|
An optional metric description. |
Remarks
Use this method to log an image file or a matplotlib plot to the run. These images will be visible and comparable in the run record.
log_list
Log a list of metric values to the run with the given name.
log_list(name, value, description='')
Parameters
Name | Description |
---|---|
name
Required
|
The name of metric. |
value
Required
|
The values of the metric. |
description
Required
|
An optional metric description. |
log_predictions
Log predictions to the artifact store.
This logs a metric score that can be used to compare the distributions of true target values to the distribution of predicted values for a regression task.
The predictions are binned and standard deviations are calculated for error bars on a line chart.
log_predictions(name, value, description='')
Parameters
Name | Description |
---|---|
name
Required
|
The name of the predictions. |
value
Required
|
JSON containing name, version, and data properties. |
description
Required
|
An optional metric description. |
Remarks
Example of a valid JSON value:
{
"schema_type": "predictions",
"schema_version": "1.0.0",
"data": {
"bin_averages": [0.25, 0.75],
"bin_errors": [0.013, 0.042],
"bin_counts": [56, 34],
"bin_edges": [0.0, 0.5, 1.0]
}
}
log_residuals
Log residuals to the artifact store.
This logs the data needed to display a histogram of residuals for a regression task. The residuals are predicted - actual.
There should be one more edge than the number of counts. Please see the numpy histogram documentation for examples of using counts and edges to represent a histogram. https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html
log_residuals(name, value, description='')
Parameters
Name | Description |
---|---|
name
Required
|
The name of the residuals. |
value
Required
|
JSON containing name, version, and data properties. |
description
Required
|
An optional metric description. |
Remarks
Example of a valid JSON value:
{
"schema_type": "residuals",
"schema_version": "1.0.0",
"data": {
"bin_edges": [50, 100, 200, 300, 350],
"bin_counts": [0.88, 20, 30, 50.99]
}
}
log_row
Log a row metric to the run with the given name.
log_row(name, description=None, **kwargs)
Parameters
Name | Description |
---|---|
name
Required
|
The name of metric. |
description
|
An optional metric description. Default value: None
|
kwargs
Required
|
A dictionary of additional parameters. In this case, the columns of the metric. |
Remarks
Using log_row
creates a table metric with columns as described in kwargs. Each named
parameter generates a column with the value specified. log_row
can be
be called once to log an arbitrary tuple, or multiple times in a loop to generate
a complete table.
citrus = ['orange', 'lemon', 'lime']
sizes = [ 10, 7, 3]
for index in range(len(citrus)):
run.log_row("citrus", fruit = citrus[index], size=sizes[index])
log_table
Log a table metric to the run with the given name.
log_table(name, value, description='')
Parameters
Name | Description |
---|---|
name
Required
|
The name of metric. |
value
Required
|
The table value of the metric, a dictionary where keys are columns to be posted to the service. |
description
Required
|
An optional metric description. |
register_model
Register a model for operationalization.
register_model(model_name, model_path=None, tags=None, properties=None, model_framework=None, model_framework_version=None, description=None, datasets=None, sample_input_dataset=None, sample_output_dataset=None, resource_configuration=None, **kwargs)
Parameters
Name | Description |
---|---|
model_name
Required
|
The name of the model. |
model_path
|
The relative cloud path to the model, for example, "outputs/modelname".
When not specified (None), Default value: None
|
tags
|
A dictionary of key value tags to assign to the model. Default value: None
|
properties
|
A dictionary of key value properties to assign to the model. These properties cannot be changed after model creation, however new key value pairs can be added. Default value: None
|
model_framework
|
The framework of the model to register. Currently supported frameworks: TensorFlow, ScikitLearn, Onnx, Custom, Multi Default value: None
|
model_framework_version
|
The framework version of the registered model. Default value: None
|
description
|
An optional description of the model. Default value: None
|
datasets
|
A list of tuples where the first element describes the dataset-model relationship and the second element is the dataset. Default value: None
|
sample_input_dataset
|
Optional. Sample input dataset for the registered model Default value: None
|
sample_output_dataset
|
Optional. Sample output dataset for the registered model Default value: None
|
resource_configuration
|
Optional. Resource configuration to run the registered model Default value: None
|
kwargs
Required
|
Optional parameters. |
Returns
Type | Description |
---|---|
The registered model. |
Remarks
model = best_run.register_model(model_name = 'best_model', model_path = 'outputs/model.pkl')
remove_tags
Delete the list of mutable tags on this run.
remove_tags(tags)
Parameters
Name | Description |
---|---|
tags
Required
|
A list of tags to remove. |
Returns
Type | Description |
---|---|
The tags stored on the run object |
restore_snapshot
Restore a snapshot as a ZIP file. Returns the path to the ZIP.
restore_snapshot(snapshot_id=None, path=None)
Parameters
Name | Description |
---|---|
snapshot_id
|
The snapshot ID to restore. The latest is used if not specified. Default value: None
|
path
|
The path where the downloaded ZIP is saved. Default value: None
|
Returns
Type | Description |
---|---|
The path. |
set_tags
Add or modify a set of tags on the run. Tags not passed in the dictionary are left untouched.
You can also add simple string tags. When these tags appear in the tag dictionary as keys, they have a value of None. For more information, see Tag and find runs.
set_tags(tags)
Parameters
Name | Description |
---|---|
tags
Required
|
The tags stored in the run object. |
start
Mark the run as started.
This is typically used in advanced scenarios when the run has been created by another actor.
start()
submit_child
Submit an experiment and return the active child run.
submit_child(config, tags=None, **kwargs)
Parameters
Name | Description |
---|---|
config
Required
|
The config to be submitted. |
tags
|
Tags to be added to the submitted run, e.g., {"tag": "value"}. Default value: None
|
kwargs
Required
|
Additional parameters used in submit function for configurations. |
Returns
Type | Description |
---|---|
A run object. |
Remarks
Submit is an asynchronous call to the Azure Machine Learning platform to execute a trial on local or remote hardware. Depending on the configuration, submit will automatically prepare your execution environments, execute your code, and capture your source code and results into the experiment's run history.
To submit an experiment you first need to create a configuration object describing how the experiment is to be run. The configuration depends on the type of trial required.
An example of how to submit a child experiment from your local machine using ScriptRunConfig is as follows:
from azureml.core import ScriptRunConfig
# run a trial from the train.py code in your current directory
config = ScriptRunConfig(source_directory='.', script='train.py',
run_config=RunConfiguration())
run = parent_run.submit_child(config)
# get the url to view the progress of the experiment and then wait
# until the trial is complete
print(run.get_portal_url())
run.wait_for_completion()
For details on how to configure a run, see submit.
tag
Tag the run with a string key and optional string value.
tag(key, value=None)
Parameters
Name | Description |
---|---|
key
Required
|
The tag key |
value
|
An optional value for the tag Default value: None
|
Remarks
Tags and properties on a run are both dictionaries of string -> string. The difference between them is mutability: Tags can be set, updated, and deleted while Properties can only be added. This makes Properties more appropriate for system/workflow related behavior triggers, while Tags are generally user-facing and meaningful for the consumers of the experiment.
run = experiment.start_logging()
run.tag('DeploymentCandidate')
run.tag('modifiedBy', 'Master CI')
run.tag('modifiedBy', 'release pipeline') # Careful, tags are mutable
run.add_properties({'BuildId': os.environ.get('VSTS_BUILD_ID')}) # Properties are not
tags = run.get_tags()
# tags = { 'DeploymentCandidate': None, 'modifiedBy': 'release pipeline' }
take_snapshot
Save a snapshot of the input file or folder.
take_snapshot(file_or_folder_path)
Parameters
Name | Description |
---|---|
file_or_folder_path
Required
|
The file or folder containing the run source code. |
Returns
Type | Description |
---|---|
Returns the snapshot ID. |
Remarks
Snapshots are intended to be the source code used to execute the experiment run. These are stored with the run so that the run trial can be replicated in the future.
Note
Snapshots are automatically taken when submit is called. Typically, this the take_snapshot method is only required for interactive (notebook) runs.
upload_file
Upload a file to the run record.
upload_file(name, path_or_stream, datastore_name=None)
Parameters
Name | Description |
---|---|
name
Required
|
The name of the file to upload. |
path_or_stream
Required
|
The relative local path or stream to the file to upload. |
datastore_name
Required
|
Optional DataStore name |
Returns
Type | Description |
---|---|
Remarks
run = experiment.start_logging()
run.upload_file(name='important_file', path_or_stream="path/on/disk/file.txt")
Note
Runs automatically capture file in the specified output directory, which defaults to "./outputs" for most run types. Use upload_file only when additional files need to be uploaded or an output directory is not specified.
upload_files
Upload files to the run record.
upload_files(names, paths, return_artifacts=False, timeout_seconds=None, datastore_name=None)
Parameters
Name | Description |
---|---|
names
Required
|
The names of the files to upload. If set, paths must also be set. |
paths
Required
|
The relative local paths to the files to upload. If set, names is required. |
return_artifacts
Required
|
Indicates that an artifact object should be returned for each file uploaded. |
timeout_seconds
Required
|
The timeout for uploading files. |
datastore_name
Required
|
Optional DataStore name |
Remarks
upload_files
has the same effect as upload_file
on separate files, however
there are performance and resource utilization benefits when using upload_files
.
import os
run = experiment.start_logging()
file_name_1 = 'important_file_1'
file_name_2 = 'important_file_2'
run.upload_files(names=[file_name_1, file_name_2],
paths=['path/on/disk/file_1.txt', 'other/path/on/disk/file_2.txt'])
run.download_file(file_name_1, 'file_1.txt')
os.mkdir("path") # The path must exist
run.download_file(file_name_2, 'path/file_2.txt')
Note
Runs automatically capture files in the specified output directory, which defaults to "./outputs" for most run types. Use upload_files only when additional files need to be uploaded or an output directory is not specified.
upload_folder
Upload the specified folder to the given prefix name.
upload_folder(name, path, datastore_name=None)
Parameters
Name | Description |
---|---|
name
Required
|
The name of the folder of files to upload. |
folder
Required
|
The relative local path to the folder to upload. |
datastore_name
Required
|
Optional DataStore name |
Remarks
run = experiment.start_logging()
run.upload_folder(name='important_files', path='path/on/disk')
run.download_file('important_files/existing_file.txt', 'local_file.txt')
Note
Runs automatically capture files in the specified output directory, which defaults to "./outputs" for most run types. Use upload_folder only when additional files need to be uploaded or an output directory is not specified.
wait_for_completion
Wait for the completion of this run. Returns the status object after the wait.
wait_for_completion(show_output=False, wait_post_processing=False, raise_on_error=True)
Parameters
Name | Description |
---|---|
show_output
|
Indicates whether to show the run output on sys.stdout. Default value: False
|
wait_post_processing
|
Indicates whether to wait for the post processing to complete after the run completes. Default value: False
|
raise_on_error
|
Indicates whether an Error is raised when the Run is in a failed state. Default value: True
|
Returns
Type | Description |
---|---|
The status object. |
Attributes
description
Return the run description.
The optional description of the run is a user-specified string useful for describing a run.
Returns
Type | Description |
---|---|
The run description. |
display_name
Return the run display name.
The optional display name of the run is a user-specified string useful for later identification of the run.
Returns
Type | Description |
---|---|
The run display name. |
experiment
Get experiment containing the run.
Returns
Type | Description |
---|---|
Retrieves the experiment corresponding to the run. |
id
Get run ID.
The ID of the run is an identifier unique across the containing experiment.
Returns
Type | Description |
---|---|
The run ID. |
name
DEPRECATED. Use display_name.
The optional name of the run is a user-specified string useful for later identification of the run.
Returns
Type | Description |
---|---|
The run ID. |
number
Get run number.
A monotonically increasing number representing the order of runs within an experiment.
Returns
Type | Description |
---|---|
The run number. |
parent
Fetch the parent run for this run from the service.
Runs can have an optional parent, resulting in a potential tree hierarchy of runs. To log metrics to
a parent run, use the log method of the parent object, for example, run.parent.log()
.
Returns
Type | Description |
---|---|
The parent run, or None if one is not set. |
properties
Return the immutable properties of this run.
Returns
Type | Description |
---|---|
The locally cached properties of the run. |
Remarks
Properties include immutable system-generated information such as duration, date of execution, user, etc.
status
Return the run object's status.
tags
Return the set of mutable tags on this run.
Returns
Type | Description |
---|---|
The tags stored on the run object. |
type
Get run type.
Indicates how the run was created or configured.
Returns
Type | Description |
---|---|
The run type. |