APLICA-SE A:
Extensão de ML da CLI do Azure v2 (atual)
SDK do Python azure-ai-ml v2 (atual)
Neste artigo, você aprenderá a se conectar aos serviços de armazenamento de dados do Azure com os armazenamentos de dados do Azure Machine Learning.
Pré-requisitos
Observação
Os armazenamentos de dados do Azure Machine Learning não criam os recursos de conta de armazenamento subjacentes. Em vez disso, eles vinculam uma conta de armazenamento existente para uso do Azure Machine Learning. Os armazenamentos de dados do Machine Learning não são necessários. Se você tiver acesso aos dados subjacentes, poderá usar URIs de armazenamento diretamente.
Criar um armazenamento de dados de Blob do Azure
from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient.from_config(credential=DefaultAzureCredential())
store = AzureBlobDatastore(
name="",
description="",
account_name="",
container_name=""
)
ml_client.create_or_update(store)
from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml.entities import AccountKeyConfiguration
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient.from_config(credential=DefaultAzureCredential())
store = AzureBlobDatastore(
name="blob_protocol_example",
description="Datastore pointing to a blob container using https protocol.",
account_name="mytestblobstore",
container_name="data-container",
protocol="https",
credentials=AccountKeyConfiguration(
account_key="aaaaaaaa-0b0b-1c1c-2d2d-333333333333"
),
)
ml_client.create_or_update(store)
from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml.entities import SasTokenConfiguration
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient.from_config(credential=DefaultAzureCredential())
store = AzureBlobDatastore(
name="blob_sas_example",
description="Datastore pointing to a blob container using SAS token.",
account_name="mytestblobstore",
container_name="data-container",
credentials=SasTokenConfiguration(
sas_token= "?xx=A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u&xx=C2dE3fH4iJ5kL6mN7oP8qR9sT0uV1wx&xx=Ff6Gg~7Hh8.-Ii9Jj0Kk1Ll2Mm3Nn4_Oo5Pp6Qq7&xx=N7oP8qR9sT0uV1wX2yZ3aB4cD5eF6g&xxx=Ee5Ff~6Gg7.-Hh8Ii9Jj0Kk1Ll2Mm3_Nn4Oo5Pp6&xxx=C2dE3fH4iJ5kL6mN7oP8qR9sT0uV1w"
),
)
ml_client.create_or_update(store)
Criar o seguinte arquivo YAML (atualize os valores apropriados):
# my_blob_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: my_blob_ds # add your datastore name here
type: azure_blob
description: here is a description # add a datastore description here
account_name: my_account_name # add the storage account name here
container_name: my_container_name # add the storage container name here
Criar o armazenamento de dados do Azure Machine Learning na CLI do Azure:
az ml datastore create --file my_blob_datastore.yml
Criar este arquivo YAML (atualize os valores apropriados):
# my_blob_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: blob_example
type: azure_blob
description: Datastore pointing to a blob container.
account_name: mytestblobstore
container_name: data-container
credentials:
account_key: aaaaaaaa-0b0b-1c1c-2d2d-333333333333
Criar o armazenamento de dados do Machine Learning na CLI:
az ml datastore create --file my_blob_datastore.yml
Criar este arquivo YAML (atualize os valores apropriados):
# my_blob_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: blob_sas_example
type: azure_blob
description: Datastore pointing to a blob container using SAS token.
account_name: mytestblobstore
container_name: data-container
credentials:
sas_token: "?xx=A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u&xx=C2dE3fH4iJ5kL6mN7oP8qR9sT0uV1wx&xx=Ff6Gg~7Hh8.-Ii9Jj0Kk1Ll2Mm3Nn4_Oo5Pp6Qq7&xx=N7oP8qR9sT0uV1wX2yZ3aB4cD5eF6g&xxx=Ee5Ff~6Gg7.-Hh8Ii9Jj0Kk1Ll2Mm3_Nn4Oo5Pp6&xxx=C2dE3fH4iJ5kL6mN7oP8qR9sT0uV1w"
Criar o armazenamento de dados do Machine Learning na CLI:
az ml datastore create --file my_blob_datastore.yml
Criar uma conta do Azure Data Lake Storage Gen2
from azure.ai.ml.entities import AzureDataLakeGen2Datastore
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient.from_config(credential=DefaultAzureCredential())
store = AzureDataLakeGen2Datastore(
name="",
description="",
account_name="",
filesystem=""
)
ml_client.create_or_update(store)
from azure.ai.ml.entities import AzureDataLakeGen2Datastore
from azure.ai.ml.entities._datastore.credentials import ServicePrincipalCredentials
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient.from_config(credential=DefaultAzureCredential())
store = AzureDataLakeGen2Datastore(
name="adls_gen2_example",
description="Datastore pointing to an Azure Data Lake Storage Gen2.",
account_name="mytestdatalakegen2",
filesystem="my-gen2-container",
credentials=ServicePrincipalCredentials(
tenant_id= "bbbbcccc-1111-dddd-2222-eeee3333ffff",
client_id= "44445555-eeee-6666-ffff-7777aaaa8888",
client_secret= "Cc3Dd~4Ee5.-Ff6Gg7Hh8Ii9Jj0Kk1_Ll2Mm3Nn4",
),
)
ml_client.create_or_update(store)
Criar este arquivo YAML (atualize os valores):
# my_adls_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen2.schema.json
name: adls_gen2_credless_example
type: azure_data_lake_gen2
description: Credential-less datastore pointing to an Azure Data Lake Storage Gen2 instance.
account_name: mytestdatalakegen2
filesystem: my-gen2-container
Criar o armazenamento de dados do Machine Learning na CLI:
az ml datastore create --file my_adls_datastore.yml
Criar este arquivo YAML (atualize os valores):
# my_adls_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen2.schema.json
name: adls_gen2_example
type: azure_data_lake_gen2
description: Datastore pointing to an Azure Data Lake Storage Gen2 instance.
account_name: mytestdatalakegen2
filesystem: my-gen2-container
credentials:
tenant_id: bbbbcccc-1111-dddd-2222-eeee3333ffff
client_id: 44445555-eeee-6666-ffff-7777aaaa8888
client_secret: Cc3Dd~4Ee5.-Ff6Gg7Hh8Ii9Jj0Kk1_Ll2Mm3Nn4
Criar o armazenamento de dados do Machine Learning na CLI:
az ml datastore create --file my_adls_datastore.yml
Criar um armazenamento de dados dos Arquivos do Azure
from azure.ai.ml.entities import AzureFileDatastore
from azure.ai.ml.entities import AccountKeyConfiguration
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient.from_config(credential=DefaultAzureCredential())
store = AzureFileDatastore(
name="file_example",
description="Datastore pointing to an Azure File Share.",
account_name="mytestfilestore",
file_share_name="my-share",
credentials=AccountKeyConfiguration(
account_key= "aaaaaaaa-0b0b-1c1c-2d2d-333333333333"
),
)
ml_client.create_or_update(store)
from azure.ai.ml.entities import AzureFileDatastore
from azure.ai.ml.entities import SasTokenConfiguration
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient.from_config(credential=DefaultAzureCredential())
store = AzureFileDatastore(
name="file_sas_example",
description="Datastore pointing to an Azure File Share using SAS token.",
account_name="mytestfilestore",
file_share_name="my-share",
credentials=SasTokenConfiguration(
sas_token="?xx=A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u&xx=C2dE3fH4iJ5kL6mN7oP8qR9sT0uV1wx&xx=Ff6Gg~7Hh8.-Ii9Jj0Kk1Ll2Mm3Nn4_Oo5Pp6Qq7&xx=N7oP8qR9sT0uV1wX2yZ3aB4cD5eF6g&xxx=Ee5Ff~6Gg7.-Hh8Ii9Jj0Kk1Ll2Mm3_Nn4Oo5Pp6&xxx=C2dE3fH4iJ5kL6mN7oP8qR9sT0uV1w"
),
)
ml_client.create_or_update(store)
Criar este arquivo YAML (atualize os valores):
# my_files_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureFile.schema.json
name: file_example
type: azure_file
description: Datastore pointing to an Azure File Share.
account_name: mytestfilestore
file_share_name: my-share
credentials:
account_key: aaaaaaaa-0b0b-1c1c-2d2d-333333333333
Criar o armazenamento de dados do Machine Learning na CLI:
az ml datastore create --file my_files_datastore.yml
Criar este arquivo YAML (atualize os valores):
# my_files_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureFile.schema.json
name: file_sas_example
type: azure_file
description: Datastore pointing to an Azure File Share using an SAS token.
account_name: mytestfilestore
file_share_name: my-share
credentials:
sas_token: "?xx=A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u&xx=C2dE3fH4iJ5kL6mN7oP8qR9sT0uV1wx&xx=Ff6Gg~7Hh8.-Ii9Jj0Kk1Ll2Mm3Nn4_Oo5Pp6Qq7&xx=N7oP8qR9sT0uV1wX2yZ3aB4cD5eF6g&xxx=Ee5Ff~6Gg7.-Hh8Ii9Jj0Kk1Ll2Mm3_Nn4Oo5Pp6&xxx=C2dE3fH4iJ5kL6mN7oP8qR9sT0uV1w"
Criar o armazenamento de dados do Machine Learning na CLI:
az ml datastore create --file my_files_datastore.yml
Criar uma conta do Azure Data Lake Storage Gen1
from azure.ai.ml.entities import AzureDataLakeGen1Datastore
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient.from_config(credential=DefaultAzureCredential())
store = AzureDataLakeGen1Datastore(
name="",
store_name="",
description="",
)
ml_client.create_or_update(store)
from azure.ai.ml.entities import AzureDataLakeGen1Datastore
from azure.ai.ml.entities._datastore.credentials import ServicePrincipalCredentials
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient.from_config(credential=DefaultAzureCredential())
store = AzureDataLakeGen1Datastore(
name="adls_gen1_example",
description="Datastore pointing to an Azure Data Lake Storage Gen1.",
store_name="mytestdatalakegen1",
credentials=ServicePrincipalCredentials(
tenant_id= "bbbbcccc-1111-dddd-2222-eeee3333ffff",
client_id= "44445555-eeee-6666-ffff-7777aaaa8888",
client_secret= "Cc3Dd~4Ee5.-Ff6Gg7Hh8Ii9Jj0Kk1_Ll2Mm3Nn4",
),
)
ml_client.create_or_update(store)
Criar este arquivo YAML (atualize os valores):
# my_adls_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen1.schema.json
name: alds_gen1_credless_example
type: azure_data_lake_gen1
description: Credential-less datastore pointing to an Azure Data Lake Storage Gen1 instance.
store_name: mytestdatalakegen1
Criar o armazenamento de dados do Machine Learning na CLI:
az ml datastore create --file my_adls_datastore.yml
Criar este arquivo YAML (atualize os valores):
# my_adls_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen1.schema.json
name: adls_gen1_example
type: azure_data_lake_gen1
description: Datastore pointing to an Azure Data Lake Storage Gen1 instance.
store_name: mytestdatalakegen1
credentials:
tenant_id: bbbbcccc-1111-dddd-2222-eeee3333ffff
client_id: 44445555-eeee-6666-ffff-7777aaaa8888
client_secret: Cc3Dd~4Ee5.-Ff6Gg7Hh8Ii9Jj0Kk1_Ll2Mm3Nn4
Criar o armazenamento de dados do Machine Learning na CLI:
az ml datastore create --file my_adls_datastore.yml
Criar um armazenamento de dados do OneLake (Microsoft Fabric) (versão prévia)
Esta seção descreve várias opções para criar um armazenamento de dados do OneLake. O armazenamento de dados do OneLake faz parte do Microsoft Fabric. No momento, o Azure Machine Learning dá suporte à conexão com artefatos do lakehouse do Microsoft Fabric que incluem pastas ou arquivos e atalhos do Amazon S3. Para obter mais informações sobre lakehouses, consulte O que é um lakehouse no Microsoft Fabric?.
A criação do armazenamento de dados do OneLake requer as seguintes informações da instância do Microsoft Fabric:
- Ponto de extremidade
- GUID do espaço de trabalho
- GUID do artefato
As capturas de tela a seguir descrevem a recuperação desses recursos de informação necessários de sua instância do Microsoft Fabric.
Em seguida, você encontrará "Ponto de Extremidade", "GUID do Espaço de Trabalho" e "GUID de Artefato" em "URL" e "Caminho do ABFS" na página "Propriedades":
-
Formato de URL: https://{seu_ponto_de_extremidade_do_one_lake}///Files
-
Formato de caminho do ABFS: abfss://{seu_guid_de_espaço_de_trabalho_do_one_lake}@//Files
Criar um armazenamento de dados do OneLake
from azure.ai.ml.entities import OneLakeDatastore, OneLakeArtifact
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient.from_config(credential=DefaultAzureCredential())
store = OneLakeDatastore(
name="onelake_example_id",
description="Datastore pointing to a Microsoft fabric artifact.",
one_lake_workspace_name="bbbbbbbb-7777-8888-9999-cccccccccccc", #{your_one_lake_workspace_guid}
endpoint="msit-onelake.dfs.fabric.microsoft.com" #{your_one_lake_endpoint}
artifact = OneLakeArtifact(
name="cccccccc-8888-9999-0000-dddddddddddd/Files", #{your_one_lake_artifact_guid}/Files
type="lake_house"
)
)
ml_client.create_or_update(store)
from azure.ai.ml.entities import OneLakeDatastore, OneLakeArtifact
from azure.ai.ml.entities._datastore.credentials import ServicePrincipalCredentials
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient.from_config(credential=DefaultAzureCredential())
store = OneLakeDatastore(
name="onelake_example_sp",
description="Datastore pointing to a Microsoft fabric artifact.",
one_lake_workspace_name="bbbbbbbb-7777-8888-9999-cccccccccccc", #{your_one_lake_workspace_guid}
endpoint="msit-onelake.dfs.fabric.microsoft.com" #{your_one_lake_endpoint}
artifact = OneLakeArtifact(
name="cccccccc-8888-9999-0000-dddddddddddd/Files", #{your_one_lake_artifact_guid}/Files
type="lake_house"
)
credentials=ServicePrincipalCredentials(
tenant_id= "bbbbcccc-1111-dddd-2222-eeee3333ffff",
client_id= "44445555-eeee-6666-ffff-7777aaaa8888",
client_secret= "Cc3Dd~4Ee5.-Ff6Gg7Hh8Ii9Jj0Kk1_Ll2Mm3Nn4",
),
)
ml_client.create_or_update(store)
Criar o arquivo YAML a seguir (atualize os valores):
# my_onelake_datastore.yml
$schema: http://azureml/sdk-2-0/OneLakeDatastore.json
name: onelake_example_id
type: one_lake
description: Credential-less datastore pointing to a OneLake lakehouse.
one_lake_workspace_name: "eeeeffff-4444-aaaa-5555-bbbb6666cccc"
endpoint: "msit-onelake.dfs.fabric.microsoft.com"
artifact:
type: lake_house
name: "1111bbbb-22cc-dddd-ee33-ffffff444444/Files"
Criar o armazenamento de dados do Machine Learning na CLI:
az ml datastore create --file my_onelake_datastore.yml
Criar o arquivo YAML a seguir (atualize os valores):
# my_onelakesp_datastore.yml
$schema: http://azureml/sdk-2-0/OneLakeDatastore.json
name: onelake_example_id
type: one_lake
description: Credential-less datastore pointing to a OneLake lakehouse.
one_lake_workspace_name: "eeeeffff-4444-aaaa-5555-bbbb6666cccc"
endpoint: "msit-onelake.dfs.fabric.microsoft.com"
artifact:
type: lake_house
name: "1111bbbb-22cc-dddd-ee33-ffffff444444/Files"
credentials:
tenant_id: bbbbcccc-1111-dddd-2222-eeee3333ffff
client_id: 44445555-eeee-6666-ffff-7777aaaa8888
client_secret: Cc3Dd~4Ee5.-Ff6Gg7Hh8Ii9Jj0Kk1_Ll2Mm3Nn4
Criar o armazenamento de dados do Machine Learning na CLI:
az ml datastore create --file my_onelakesp_datastore.yml
Próximas etapas