CLI (v2) feature set YAML schema
APPLIES TO: Azure CLI ml extension v2 (current)
Note
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2 extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
Key | Type | Description | Allowed values | Default value |
---|---|---|---|---|
$schema | string | The YAML schema. If you use the Azure Machine Learning VS Code extension to author the YAML file, including $schema at the top of your file enables you to invoke schema and resource completions. | ||
name | string | Required. Feature set name. | ||
version | string | Required. Feature set version. | ||
description | string | Feature set description. | ||
specification | object | Required. Feature set specification. | ||
specification.path | string | Required Path to the local feature set spec folder. | ||
entities | object (list of string) | Required. The entities that this feature set is associated to. | ||
stage | string | Feature set stage. | Development, Production, Archived | Development |
tags | object | Dictionary of tags for the feature set. | ||
materialization_settings | object | Feature set materialization setting. | ||
materialization_settings.offline_enabled | boolean | Whether materializing feature values to an offline storage is enabled. | True, False | |
materialization_settings.schedule | object | The materialization schedule. See CLI (v2) schedule YAML schema | ||
materialization_settings.schedule.frequency | string | Required if schedule is configured. Enum to describe the frequency of a recurrence schedule. | Day, Hour, Minute, Week, Month | Day |
materialization_settings.schedule.interval | integer | Required if schedule is configured. The interval between recurrent jobs. | ||
materialization_settings.schedule.time_zone | string | The schedule trigger time zone. | UTC | |
materialization_settings.schedule.start_time | string | The schedule trigger time. | ||
materialization_settings.notification | object | The materialization notification setting. | ||
materialization_settings.notification.email_on | object (list of string) | Required if notification is configured. The email notification is sent when job status matches this setting. | JobFailed, JobCompleted, JobCancelled. | |
materialization_settings.notification.emails | object (list of string) | Required if notification is configured. The email address the notification is sent to. | ||
materialization_settings.resource | object | The Azure Machine Learning Spark compute resource used for materialization job. | ||
materialization_settings.resource.instance_type | string | Azure Machine Learning Spark compute instance type. | Standard_E4s_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E64s_v3. Refer to Interactive Data Wrangling with Apache Spark in Azure Machine Learning (preview) to get updated list of supported types. | |
materialization_settings.spark_configuration | dictionary | dictionary of spark configuration |
Remarks
The az ml feature-set
command can be used for managing feature set.
Examples
Examples are available in the examples GitHub repository. Several are shown below.
YAML: basic
$schema: http://azureml/sdk-2-0/Featureset.json
name: transactions
version: "1"
description: 7-day and 3-day rolling aggregation of transactions featureset
specification:
path: ./spec # path to feature set specification folder. Can be local (absolute path or relative path to current location) or cloud uri. Contains FeatureSetSpec.yaml + transformation code
entities: # entities associated with this feature-set
- azureml:account:1
stage: Development
YAML: with materialization configuration
name: transactions
version: "1"
description: 7-day and 3-day rolling aggregation of transactions featureset
specification:
path: ./spec # path to feature set specification folder. Can be local (absolute path or relative path to current location) or cloud uri. Contains FeatureSetSpec.yaml + transformation code
entities: # entities associated with this feature-set
- azureml:account:1
stage: Development
materialization_settings:
offline_enabled: True
schedule: # we use existing definition of schedule under job with some constraints. Recurrence pattern will not be supported.
type: recurrence # Only recurrence type would be supported
frequency: Day # Only support Day and Hour
interval: 1 #every day
time_zone: "Pacific Standard Time"
notification:
email_on:
- JobFailed
emails:
- alice@microsoft.com
resource:
instance_type: Standard_E8S_V3
spark_configuration:
spark.driver.cores: 4
spark.driver.memory: 36g
spark.executor.cores: 4
spark.executor.memory: 36g
spark.executor.instances: 2