你当前正在访问 Microsoft Azure Global Edition 技术文档网站。 如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站,请访问 https://docs.azure.cn 。
本文内容
本文介绍如何使用 Terraform 配置文件创建和管理 Azure 机器学习工作区。 使用 Terraform 基于模板的配置文件,能够以可重复、可预测的方式定义、创建和配置 Azure 资源。 Terraform 跟踪资源状态,并能够清理和销毁资源。
Terraform 配置是定义部署所需资源的文档。 它还可以指定部署变量。 使用配置时,变量用于提供输入值。
先决条件
限制
创建新的工作区时,可以自动创建工作区所需的服务或使用现有的服务。 如果要使用来自不同于工作区所在的 Azure 订阅的现有服务,则必须在包含这些服务的订阅中注册 Azure 机器学习命名空间。 例如,在订阅 A 中创建一个使用订阅 B 中的存储帐户的工作区时,必须在订阅 B 中注册 Azure 机器学习命名空间,然后才能将此存储帐户用于该工作区。
Azure 机器学习的资源提供程序是 Microsoft.MachineLearningServices。 有关如何查看它是否已注册以及如何注册的信息,请参阅 Azure 资源提供程序和类型 一文。
重要
这仅适用于工作区创建期间提供的资源:Azure 存储帐户、Azure 容器注册表、Azure Key Vault 和 Application Insights。
提示
创建工作区时,将创建一个 Azure Application Insights 实例。 如果需要,可以在创建群集后删除 Application Insights 实例。 删除它会限制从工作区收集的信息,并且可能会使问题解决起来更加困难。 如果删除工作区创建的 Application Insights 实例,则无法在不删除和重新创建工作区的情况下重新创建实例。
若要详细了解如何使用此 Application Insights 实例,请参阅从机器学习 Web 服务终结点监视和收集数据 。
声明 Azure 提供程序
创建声明 Azure 提供程序的 Terraform 配置文件:
创建名为 main.tf
的新文件。 如果使用 Azure Cloud Shell,请使用 bash:
code main.tf
在编辑器中粘贴以下代码:
main.tf:
data "azurerm_client_config" "current" {}
resource "azurerm_resource_group" "default" {
name = "${random_pet.prefix.id}-rg"
location = var.location
}
resource "random_pet" "prefix" {
prefix = var.prefix
length = 2
}
resource "random_integer" "suffix" {
min = 10000000
max = 99999999
}
保存文件 (<Ctrl>S) 并退出编辑器 (<Ctrl>Q)。
部署工作区
可使用以下 Terraform 配置创建 Azure 机器学习工作区。 创建 Azure 机器学习工作区时,需要各种其他服务作为依赖项。 该模板还会向工作区指定这些关联资源 。 根据需要,可选择使用创建具有公用或专用网络连接的资源的模板。
Azure 中的某些资源需要全局唯一的名称。 使用以下模板部署资源之前,将 name
变量设置为唯一值。
variables.tf:
variable "environment" {
type = string
description = "Name of the environment"
default = "dev"
}
variable "location" {
type = string
description = "Location of the resources"
default = "eastus"
}
variable "prefix" {
type = string
description = "Prefix of the resource name"
default = "ml"
}
workspace.tf:
# Dependent resources for Azure Machine Learning
resource "azurerm_application_insights" "default" {
name = "${random_pet.prefix.id}-appi"
location = azurerm_resource_group.default.location
resource_group_name = azurerm_resource_group.default.name
application_type = "web"
}
resource "azurerm_key_vault" "default" {
name = "${var.prefix}${var.environment}${random_integer.suffix.result}kv"
location = azurerm_resource_group.default.location
resource_group_name = azurerm_resource_group.default.name
tenant_id = data.azurerm_client_config.current.tenant_id
sku_name = "premium"
purge_protection_enabled = false
}
resource "azurerm_storage_account" "default" {
name = "${var.prefix}${var.environment}${random_integer.suffix.result}st"
location = azurerm_resource_group.default.location
resource_group_name = azurerm_resource_group.default.name
account_tier = "Standard"
account_replication_type = "GRS"
allow_nested_items_to_be_public = false
}
resource "azurerm_container_registry" "default" {
name = "${var.prefix}${var.environment}${random_integer.suffix.result}cr"
location = azurerm_resource_group.default.location
resource_group_name = azurerm_resource_group.default.name
sku = "Premium"
admin_enabled = true
}
# Machine Learning workspace
resource "azurerm_machine_learning_workspace" "default" {
name = "${random_pet.prefix.id}-mlw"
location = azurerm_resource_group.default.location
resource_group_name = azurerm_resource_group.default.name
application_insights_id = azurerm_application_insights.default.id
key_vault_id = azurerm_key_vault.default.id
storage_account_id = azurerm_storage_account.default.id
container_registry_id = azurerm_container_registry.default.id
public_network_access_enabled = true
identity {
type = "SystemAssigned"
}
}
以下配置使用 Azure 专用链接终结点在隔离的网络环境中创建工作区。 包含专用 DNS 区域 ,以便可以在虚拟网络中解析域名。
Azure 中的某些资源需要全局唯一的名称。 使用以下模板部署资源之前,将 resourceprefix
变量设置为唯一值。
在将专用链接终结点同时用于 Azure 容器注册表和 Azure 机器学习时,无法使用 Azure 容器注册表任务生成环境 映像。 而可使用 Azure 机器学习计算群集生成映像。 要配置使用的群集名,请设置 image_build_compute_name 参数。 可使用 public_network_access_enabled 参数配置为允许公开访问 具有专用链接终结点的工作区。
variables.tf:
variable "name" {
type = string
description = "Name of the deployment"
default = "examplehost"
}
variable "environment" {
type = string
description = "Name of the environment"
default = "dev"
}
variable "location" {
type = string
description = "Location of the resources"
default = "East US"
}
variable "vnet_address_space" {
type = list(string)
description = "Address space of the virtual network"
default = ["10.0.0.0/16"]
}
variable "training_subnet_address_space" {
type = list(string)
description = "Address space of the training subnet"
default = ["10.0.1.0/24"]
}
variable "aks_subnet_address_space" {
type = list(string)
description = "Address space of the aks subnet"
default = ["10.0.2.0/23"]
}
variable "ml_subnet_address_space" {
type = list(string)
description = "Address space of the ML workspace subnet"
default = ["10.0.0.0/24"]
}
variable "dsvm_subnet_address_space" {
type = list(string)
description = "Address space of the DSVM subnet"
default = ["10.0.4.0/24"]
}
variable "bastion_subnet_address_space" {
type = list(string)
description = "Address space of the bastion subnet"
default = ["10.0.5.0/24"]
}
variable "image_build_compute_name" {
type = string
description = "Name of the compute cluster to be created and set to build docker images"
default = "image-builder"
}
# DSVM Variables
variable "dsvm_name" {
type = string
description = "Name of the Data Science VM"
default = "vmdsvm01"
}
variable "dsvm_admin_username" {
type = string
description = "Admin username of the Data Science VM"
default = "azureadmin"
}
variable "dsvm_host_password" {
type = string
description = "Password for the admin username of the Data Science VM"
default = "ChangeMe123!"
sensitive = true
}
workspace.tf:
# Dependent resources for Azure Machine Learning
resource "azurerm_application_insights" "default" {
name = "appi-${var.name}-${var.environment}"
location = azurerm_resource_group.default.location
resource_group_name = azurerm_resource_group.default.name
application_type = "web"
}
resource "random_string" "kv_prefix" {
length = 4
upper = false
special = false
numeric = false
}
resource "azurerm_key_vault" "default" {
name = "kv-${random_string.kv_prefix.result}-${var.environment}"
location = azurerm_resource_group.default.location
resource_group_name = azurerm_resource_group.default.name
tenant_id = data.azurerm_client_config.current.tenant_id
sku_name = "premium"
purge_protection_enabled = true
network_acls {
default_action = "Deny"
bypass = "AzureServices"
}
}
resource "random_string" "sa_prefix" {
length = 4
upper = false
special = false
numeric = false
}
resource "azurerm_storage_account" "default" {
name = "st${random_string.sa_prefix.result}${var.environment}"
location = azurerm_resource_group.default.location
resource_group_name = azurerm_resource_group.default.name
account_tier = "Standard"
account_replication_type = "GRS"
network_rules {
default_action = "Deny"
bypass = ["AzureServices"]
}
}
resource "azurerm_container_registry" "default" {
name = "cr${var.name}${var.environment}"
location = azurerm_resource_group.default.location
resource_group_name = azurerm_resource_group.default.name
sku = "Premium"
admin_enabled = true
network_rule_set {
default_action = "Deny"
}
public_network_access_enabled = false
}
# Machine Learning workspace
resource "azurerm_machine_learning_workspace" "default" {
name = "mlw-${var.name}-${var.environment}"
location = azurerm_resource_group.default.location
resource_group_name = azurerm_resource_group.default.name
application_insights_id = azurerm_application_insights.default.id
key_vault_id = azurerm_key_vault.default.id
storage_account_id = azurerm_storage_account.default.id
container_registry_id = azurerm_container_registry.default.id
identity {
type = "SystemAssigned"
}
# Args of use when using an Azure Private Link configuration
public_network_access_enabled = false
image_build_compute_name = var.image_build_compute_name
depends_on = [
azurerm_private_endpoint.kv_ple,
azurerm_private_endpoint.st_ple_blob,
azurerm_private_endpoint.storage_ple_file,
azurerm_private_endpoint.cr_ple,
azurerm_subnet.snet-training
]
}
# Private endpoints
resource "azurerm_private_endpoint" "kv_ple" {
name = "ple-${var.name}-${var.environment}-kv"
location = azurerm_resource_group.default.location
resource_group_name = azurerm_resource_group.default.name
subnet_id = azurerm_subnet.snet-workspace.id
private_dns_zone_group {
name = "private-dns-zone-group"
private_dns_zone_ids = [azurerm_private_dns_zone.dnsvault.id]
}
private_service_connection {
name = "psc-${var.name}-kv"
private_connection_resource_id = azurerm_key_vault.default.id
subresource_names = ["vault"]
is_manual_connection = false
}
}
resource "azurerm_private_endpoint" "st_ple_blob" {
name = "ple-${var.name}-${var.environment}-st-blob"
location = azurerm_resource_group.default.location
resource_group_name = azurerm_resource_group.default.name
subnet_id = azurerm_subnet.snet-workspace.id
private_dns_zone_group {
name = "private-dns-zone-group"
private_dns_zone_ids = [azurerm_private_dns_zone.dnsstorageblob.id]
}
private_service_connection {
name = "psc-${var.name}-st"
private_connection_resource_id = azurerm_storage_account.default.id
subresource_names = ["blob"]
is_manual_connection = false
}
}
resource "azurerm_private_endpoint" "storage_ple_file" {
name = "ple-${var.name}-${var.environment}-st-file"
location = azurerm_resource_group.default.location
resource_group_name = azurerm_resource_group.default.name
subnet_id = azurerm_subnet.snet-workspace.id
private_dns_zone_group {
name = "private-dns-zone-group"
private_dns_zone_ids = [azurerm_private_dns_zone.dnsstoragefile.id]
}
private_service_connection {
name = "psc-${var.name}-st"
private_connection_resource_id = azurerm_storage_account.default.id
subresource_names = ["file"]
is_manual_connection = false
}
}
resource "azurerm_private_endpoint" "cr_ple" {
name = "ple-${var.name}-${var.environment}-cr"
location = azurerm_resource_group.default.location
resource_group_name = azurerm_resource_group.default.name
subnet_id = azurerm_subnet.snet-workspace.id
private_dns_zone_group {
name = "private-dns-zone-group"
private_dns_zone_ids = [azurerm_private_dns_zone.dnscontainerregistry.id]
}
private_service_connection {
name = "psc-${var.name}-cr"
private_connection_resource_id = azurerm_container_registry.default.id
subresource_names = ["registry"]
is_manual_connection = false
}
}
resource "azurerm_private_endpoint" "mlw_ple" {
name = "ple-${var.name}-${var.environment}-mlw"
location = azurerm_resource_group.default.location
resource_group_name = azurerm_resource_group.default.name
subnet_id = azurerm_subnet.snet-workspace.id
private_dns_zone_group {
name = "private-dns-zone-group"
private_dns_zone_ids = [azurerm_private_dns_zone.dnsazureml.id, azurerm_private_dns_zone.dnsnotebooks.id]
}
private_service_connection {
name = "psc-${var.name}-mlw"
private_connection_resource_id = azurerm_machine_learning_workspace.default.id
subresource_names = ["amlworkspace"]
is_manual_connection = false
}
}
# Compute cluster for image building required since the workspace is behind a vnet.
# For more details, see https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-create-secure-workspace#configure-image-builds.
resource "azurerm_machine_learning_compute_cluster" "image-builder" {
name = var.image_build_compute_name
location = azurerm_resource_group.default.location
vm_priority = "LowPriority"
vm_size = "Standard_DS2_v2"
machine_learning_workspace_id = azurerm_machine_learning_workspace.default.id
subnet_resource_id = azurerm_subnet.snet-training.id
scale_settings {
min_node_count = 0
max_node_count = 3
scale_down_nodes_after_idle_duration = "PT15M" # 15 minutes
}
identity {
type = "SystemAssigned"
}
}
network.tf:
# Virtual Network
resource "azurerm_virtual_network" "default" {
name = "vnet-${var.name}-${var.environment}"
address_space = var.vnet_address_space
location = azurerm_resource_group.default.location
resource_group_name = azurerm_resource_group.default.name
}
resource "azurerm_subnet" "snet-training" {
name = "snet-training"
resource_group_name = azurerm_resource_group.default.name
virtual_network_name = azurerm_virtual_network.default.name
address_prefixes = var.training_subnet_address_space
enforce_private_link_endpoint_network_policies = true
}
resource "azurerm_subnet" "snet-aks" {
name = "snet-aks"
resource_group_name = azurerm_resource_group.default.name
virtual_network_name = azurerm_virtual_network.default.name
address_prefixes = var.aks_subnet_address_space
enforce_private_link_endpoint_network_policies = true
}
resource "azurerm_subnet" "snet-workspace" {
name = "snet-workspace"
resource_group_name = azurerm_resource_group.default.name
virtual_network_name = azurerm_virtual_network.default.name
address_prefixes = var.ml_subnet_address_space
enforce_private_link_endpoint_network_policies = true
}
# ...
# For full reference, see: https://github.com/Azure/terraform/blob/master/quickstart/201-machine-learning-moderately-secure/network.tf
有几个选项可以连接到专用链接终结点工作区。 要详细了解这些选项,请参阅安全连接到工作区 。
疑难解答
资源提供程序错误
创建 Azure 机器学习工作区或工作区使用的资源时,可能会收到类似于以下消息的错误:
No registered resource provider found for location {location}
The subscription is not registered to use namespace {resource-provider-namespace}
大多数资源提供程序会自动注册,但并非全部。 如果收到此消息,则需要注册所提到的提供程序。
下表包含 Azure 机器学习所需的资源提供程序的列表:
资源提供程序
为什么需要它
Microsoft.MachineLearningServices
创建 Azure 机器学习工作区。
Microsoft.Storage
Azure 存储帐户用作该工作区的默认存储。
Microsoft.ContainerRegistry
Azure 容器注册表被工作区用来生成 Docker 映像。
Microsoft.KeyVault
该工作区使用 Azure Key Vault 来存储机密。
Microsoft.Notebooks
Azure 机器学习计算实例上集成的笔记本。
Microsoft.ContainerService
如果计划将训练后的模型部署到 Azure Kubernetes 服务。
如果计划将客户管理的密钥与 Azure 机器学习一起使用,则必须注册以下服务提供程序:
资源提供程序
为什么需要它
Microsoft.DocumentDB
用于记录工作区元数据的 Azure CosmosDB 实例。
Microsoft.Search
Azure 搜索为工作区提供索引编制功能。
如果打算将托管虚拟网络与 Azure 机器学习配合使用,必须注册 Microsoft.Network 资源提供程序。 为托管虚拟网络创建专用终结点时,工作区会使用此资源提供程序。
有关注册资源提供程序的信息,请参阅解决资源提供程序注册错误 。
后续步骤