Erstellen von Datenspeichern

Artikel
05/08/2024

GILT FÜR:Azure CLI ML-Erweiterung v2 (aktuell)Python SDK azure-ai-ml v2 (aktuell)

In diesem Artikel lernen Sie, wie Sie mit Azure Machine Learning-Datenspeichern eine Verbindung mit Azure-Datenspeicherdiensten herstellen.

Voraussetzungen

Ein Azure-Abonnement. Wenn Sie nicht über ein Azure-Abonnement verfügen, können Sie ein kostenloses Konto erstellen, bevor Sie beginnen. Probieren Sie die kostenlose oder kostenpflichtige Version von Azure Machine Learning aus.
Das Azure Machine Learning SDK für Python.
Ein Machine Learning-Arbeitsbereich.

Hinweis

Machine Learning-Datenspeicher erstellen nicht die zugrunde liegenden Speicherkontoressourcen. Stattdessen verknüpfen sie ein vorhandenes Speicherkonto für die Verwendung durch Machine Learning. Machine Learning-Datenspeicher sind nicht erforderlich. Wenn Sie Zugriff auf die zugrunde liegenden Daten haben, können Sie Speicher-URIs direkt verwenden.

Erstellen eines Azure-Blobdatenspeichers

from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureBlobDatastore(
    name="",
    description="",
    account_name="",
    container_name=""
)

ml_client.create_or_update(store)

from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml.entities import AccountKeyConfiguration
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureBlobDatastore(
    name="blob_protocol_example",
    description="Datastore pointing to a blob container using https protocol.",
    account_name="mytestblobstore",
    container_name="data-container",
    protocol="https",
    credentials=AccountKeyConfiguration(
        account_key="XXXxxxXXXxXXXXxxXXXXXxXXXXXxXxxXxXXXxXXXxXXxxxXXxxXXXxXxXXXxxXxxXXXXxxxxxXXxxxxxxXXXxXXX"
    ),
)

ml_client.create_or_update(store)

from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml.entities import SasTokenConfiguration
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureBlobDatastore(
    name="blob_sas_example",
    description="Datastore pointing to a blob container using SAS token.",
    account_name="mytestblobstore",
    container_name="data-container",
    credentials=SasTokenConfiguration(
        sas_token= "?xx=XXXX-XX-XX&xx=xxxx&xxx=xxx&xx=xxxxxxxxxxx&xx=XXXX-XX-XXXXX:XX:XXX&xx=XXXX-XX-XXXXX:XX:XXX&xxx=xxxxx&xxx=XXxXXXxxxxxXXXXXXXxXxxxXXXXXxxXXXXXxXXXXxXXXxXXxXX"
    ),
)

ml_client.create_or_update(store)

Erstellen Sie die folgende YAML-Datei (aktualisieren Sie die entsprechenden Werte):

# my_blob_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: my_blob_ds # add your datastore name here
type: azure_blob
description: here is a description # add a datastore description here
account_name: my_account_name # add the storage account name here
container_name: my_container_name # add the storage container name here

Erstellen Sie den Machine Learning-Datenspeicher in der Azure CLI:

az ml datastore create --file my_blob_datastore.yml

Erstellen Sie diese YAML-Datei (aktualisieren Sie die entsprechenden Werte):

# my_blob_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: blob_example
type: azure_blob
description: Datastore pointing to a blob container.
account_name: mytestblobstore
container_name: data-container
credentials:
  account_key: XXXxxxXXXxXXXXxxXXXXXxXXXXXxXxxXxXXXxXXXxXXxxxXXxxXXXxXxXXXxxXxxXXXXxxxxxXXxxxxxxXXXxXXX

Erstellen Sie den Machine Learning-Datenspeicher in der CLI:

az ml datastore create --file my_blob_datastore.yml

Erstellen Sie diese YAML-Datei (aktualisieren Sie die entsprechenden Werte):

# my_blob_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: blob_sas_example
type: azure_blob
description: Datastore pointing to a blob container using SAS token.
account_name: mytestblobstore
container_name: data-container
credentials:
  sas_token: ?xx=XXXX-XX-XX&xx=xxxx&xxx=xxx&xx=xxxxxxxxxxx&xx=XXXX-XX-XXXXX:XX:XXX&xx=XXXX-XX-XXXXX:XX:XXX&xxx=xxxxx&xxx=XXxXXXxxxxxXXXXXXXxXxxxXXXXXxxXXXXXxXXXXxXXXxXXxXX

Erstellen Sie den Machine Learning-Datenspeicher in der CLI:

az ml datastore create --file my_blob_datastore.yml

Erstellen eines Azure Data Lake Storage Gen2-Datenspeichers

from azure.ai.ml.entities import AzureDataLakeGen2Datastore
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureDataLakeGen2Datastore(
    name="",
    description="",
    account_name="",
    filesystem=""
)

ml_client.create_or_update(store)

from azure.ai.ml.entities import AzureDataLakeGen2Datastore
from azure.ai.ml.entities._datastore.credentials import ServicePrincipalCredentials

from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureDataLakeGen2Datastore(
    name="adls_gen2_example",
    description="Datastore pointing to an Azure Data Lake Storage Gen2.",
    account_name="mytestdatalakegen2",
    filesystem="my-gen2-container",
     credentials=ServicePrincipalCredentials(
        tenant_id= "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
        client_id= "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
        client_secret= "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    ),
)

ml_client.create_or_update(store)

Erstellen Sie diese YAML-Datei (aktualisieren Sie die Werte):

# my_adls_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen2.schema.json
name: adls_gen2_credless_example
type: azure_data_lake_gen2
description: Credential-less datastore pointing to an Azure Data Lake Storage Gen2 instance.
account_name: mytestdatalakegen2
filesystem: my-gen2-container

Erstellen Sie den Machine Learning-Datenspeicher in der CLI:

az ml datastore create --file my_adls_datastore.yml

Erstellen Sie diese YAML-Datei (aktualisieren Sie die Werte):

# my_adls_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen2.schema.json
name: adls_gen2_example
type: azure_data_lake_gen2
description: Datastore pointing to an Azure Data Lake Storage Gen2 instance.
account_name: mytestdatalakegen2
filesystem: my-gen2-container
credentials:
  tenant_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
  client_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
  client_secret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Erstellen Sie den Machine Learning-Datenspeicher in der CLI:

az ml datastore create --file my_adls_datastore.yml

Erstellen eines Azure Files-Datenspeichers

from azure.ai.ml.entities import AzureFileDatastore
from azure.ai.ml.entities import AccountKeyConfiguration
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureFileDatastore(
    name="file_example",
    description="Datastore pointing to an Azure File Share.",
    account_name="mytestfilestore",
    file_share_name="my-share",
    credentials=AccountKeyConfiguration(
        account_key= "XXXxxxXXXxXXXXxxXXXXXxXXXXXxXxxXxXXXxXXXxXXxxxXXxxXXXxXxXXXxxXxxXXXXxxxxxXXxxxxxxXXXxXXX"
    ),
)

ml_client.create_or_update(store)

from azure.ai.ml.entities import AzureFileDatastore
from azure.ai.ml.entities import SasTokenConfiguration
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureFileDatastore(
    name="file_sas_example",
    description="Datastore pointing to an Azure File Share using SAS token.",
    account_name="mytestfilestore",
    file_share_name="my-share",
    credentials=SasTokenConfiguration(
        sas_token="?xx=XXXX-XX-XX&xx=xxxx&xxx=xxx&xx=xxxxxxxxxxx&xx=XXXX-XX-XXXXX:XX:XXX&xx=XXXX-XX-XXXXX:XX:XXX&xxx=xxxxx&xxx=XXxXXXxxxxxXXXXXXXxXxxxXXXXXxxXXXXXxXXXXxXXXxXXxXX"
    ),
)

ml_client.create_or_update(store)

Erstellen Sie diese YAML-Datei (aktualisieren Sie die Werte):

# my_files_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureFile.schema.json
name: file_example
type: azure_file
description: Datastore pointing to an Azure File Share.
account_name: mytestfilestore
file_share_name: my-share
credentials:
  account_key: XxXxXxXXXXXXXxXxXxxXxxXXXXXXXXxXxxXXxXXXXXXXxxxXxXXxXXXXXxXXxXXXxXxXxxxXXxXXxXXXXXxXxxXX

Erstellen Sie den Machine Learning-Datenspeicher in der CLI:

az ml datastore create --file my_files_datastore.yml

Erstellen Sie diese YAML-Datei (aktualisieren Sie die Werte):

# my_files_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureFile.schema.json
name: file_sas_example
type: azure_file
description: Datastore pointing to an Azure File Share using an SAS token.
account_name: mytestfilestore
file_share_name: my-share
credentials:
  sas_token: ?xx=XXXX-XX-XX&xx=xxxx&xxx=xxx&xx=xxxxxxxxxxx&xx=XXXX-XX-XXXXX:XX:XXX&xx=XXXX-XX-XXXXX:XX:XXX&xxx=xxxxx&xxx=XXxXXXxxxxxXXXXXXXxXxxxXXXXXxxXXXXXxXXXXxXXXxXXxXX

Erstellen Sie den Machine Learning-Datenspeicher in der CLI:

az ml datastore create --file my_files_datastore.yml

Erstellen eines Azure Data Lake Storage Gen1-Datenspeichers

from azure.ai.ml.entities import AzureDataLakeGen1Datastore
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureDataLakeGen1Datastore(
    name="",
    store_name="",
    description="",
)

ml_client.create_or_update(store)

from azure.ai.ml.entities import AzureDataLakeGen1Datastore
from azure.ai.ml.entities._datastore.credentials import ServicePrincipalCredentials
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureDataLakeGen1Datastore(
    name="adls_gen1_example",
    description="Datastore pointing to an Azure Data Lake Storage Gen1.",
    store_name="mytestdatalakegen1",
    credentials=ServicePrincipalCredentials(
        tenant_id= "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
        client_id= "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
        client_secret= "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    ),
)

ml_client.create_or_update(store)

Erstellen Sie diese YAML-Datei (aktualisieren Sie die Werte):

# my_adls_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen1.schema.json
name: alds_gen1_credless_example
type: azure_data_lake_gen1
description: Credential-less datastore pointing to an Azure Data Lake Storage Gen1 instance.
store_name: mytestdatalakegen1

Erstellen Sie den Machine Learning-Datenspeicher in der CLI:

az ml datastore create --file my_adls_datastore.yml

Erstellen Sie diese YAML-Datei (aktualisieren Sie die Werte):

# my_adls_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen1.schema.json
name: adls_gen1_example
type: azure_data_lake_gen1
description: Datastore pointing to an Azure Data Lake Storage Gen1 instance.
store_name: mytestdatalakegen1
credentials:
  tenant_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
  client_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
  client_secret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Erstellen Sie den Machine Learning-Datenspeicher in der CLI:

az ml datastore create --file my_adls_datastore.yml

Erstellen eines OneLake (Microsoft Fabric)-Datenspeichers (Vorschau)

In diesem Abschnitt werden verschiedene Optionen zum Erstellen eines OneLake-Datenspeichers beschrieben. Der OneLake-Datenspeicher ist Teil von Microsoft Fabric. Derzeit unterstützt Machine Learning die Verbindung mit Microsoft Fabric-Lakehouse-Artefakten, die Ordner oder Dateien und Amazon S3-Verknüpfungen umfassen. Weitere Informationen zu Lakehouses finden Sie unter Was ist ein Lakehouse in Microsoft Fabric?.

Die Erstellung von OneLake-Datenspeichern erfordert die folgenden Informationen aus Ihrer Microsoft Fabric-Instanz:

Endpunkt
Name oder GUID des Fabric-Arbeitsbereichs
Name oder GUID des Artefakts

Die folgenden drei Screenshots beschreiben das Abrufen dieser erforderlichen Informationsressourcen aus Ihrer Microsoft Fabric-Instanz.

OneLake-Arbeitsbereichsname

In Ihrer Microsoft Fabric-Instanz können Sie die Arbeitsbereichsinformationen finden, wie in diesem Screenshot gezeigt. Sie können entweder einen GUID-Wert oder einen „Anzeigenamen“ verwenden, um einen Machine Learning-OneLake-Datenspeicher zu erstellen.

OneLake-Endpunkt

Dieser Screenshot zeigt, wie Sie die Endpunktinformationen in Ihrer Microsoft Fabric-Instanz finden können.

OneLake-Artefaktname

Dieser Screenshot zeigt, wie Sie die Artefaktinformationen in Ihrer Microsoft Fabric-Instanz ausfindig machen. Der Screenshot zeigt auch, wie Sie entweder einen GUID-Wert oder einen Anzeigenamen verwenden können, um einen Machine Learning-OneLake-Datenspeicher zu erstellen.

Erstellen eines OneLake-Datenspeichers

from azure.ai.ml.entities import OneLakeDatastore, OneLakeArtifact
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = OneLakeDatastore(
    name="onelake_example_id",
    description="Datastore pointing to an Microsoft fabric artifact.",
    one_lake_workspace_name="AzureML_Sample_OneLakeWS",
    endpoint="msit-onelake.dfs.fabric.microsoft.com"
    artifact = OneLakeArtifact(
        name="AzML_Sample_LH",
        type="lake_house"
    )
)

ml_client.create_or_update(store)

from azure.ai.ml.entities import AzureDataLakeGen1Datastore
from azure.ai.ml.entities._datastore.credentials import ServicePrincipalCredentials
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

rom azure.ai.ml.entities import OneLakeDatastore, OneLakeArtifact
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = OneLakeDatastore(
    name="onelake_example_sp",
    description="Datastore pointing to an Microsoft fabric artifact.",
    one_lake_workspace_name="AzureML_Sample_OneLakeWS",
    endpoint="msit-onelake.dfs.fabric.microsoft.com"
    artifact = OneLakeArtifact(
    name="AzML_Sample_LH",
    type="lake_house"
    )
    credentials=ServicePrincipalCredentials(
        tenant_id= "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
        client_id= "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
        client_secret= "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    ),
)

ml_client.create_or_update(store)

Erstellen Sie die folgende YAML-Datei (aktualisieren Sie die Werte):

# my_onelake_datastore.yml
$schema: http://azureml/sdk-2-0/OneLakeDatastore.json
name: onelake_example_id
type: one_lake
description: Credential-less datastore pointing to a OneLake lakehouse.
one_lake_workspace_name: "AzureML_Sample_OneLakeWS"
endpoint: "msit-onelake.dfs.fabric.microsoft.com"
artifact:
  type: lake_house
  name: "AzML_Sample_LH"

Erstellen Sie den Machine Learning-Datenspeicher in der CLI:

az ml datastore create --file my_onelake_datastore.yml

Erstellen Sie die folgende YAML-Datei (aktualisieren Sie die Werte):

# my_onelakesp_datastore.yml
$schema: http://azureml/sdk-2-0/OneLakeDatastore.json
name: onelake_example_id
type: one_lake
description: Credential-less datastore pointing to a OneLake lakehouse.
one_lake_workspace_name: "AzureML_Sample_OneLakeWS"
endpoint: "msit-onelake.dfs.fabric.microsoft.com"
artifact:
  type: lake_house
  name: "AzML_Sample_LH"
credentials:
  tenant_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
  client_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
  client_secret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Erstellen Sie den Machine Learning-Datenspeicher in der CLI:

az ml datastore create --file my_onelakesp_datastore.yml

Erstellen von Datenspeichern

Voraussetzungen

Erstellen eines Azure-Blobdatenspeichers

Erstellen eines Azure Data Lake Storage Gen2-Datenspeichers

Erstellen eines Azure Files-Datenspeichers

Erstellen eines Azure Data Lake Storage Gen1-Datenspeichers

Erstellen eines OneLake (Microsoft Fabric)-Datenspeichers (Vorschau)

OneLake-Arbeitsbereichsname

OneLake-Endpunkt

OneLake-Artefaktname

Erstellen eines OneLake-Datenspeichers

Nächste Schritte

Zusätzliche Ressourcen