Create datastores

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

In this article, learn how to connect to Azure data storage services with Azure Machine Learning datastores.

Prerequisites

Note

Azure Machine Learning datastores do not create the underlying storage account resources. Instead, they link an existing storage account for Azure Machine Learning use. This does not require Azure Machine Learning datastores. If you have access to the underlying data, you can use storage URIs directly.

Create an Azure Blob datastore

from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureBlobDatastore(
    name="",
    description="",
    account_name="",
    container_name=""
)

ml_client.create_or_update(store)

Create an Azure Data Lake Gen2 datastore

from azure.ai.ml.entities import AzureDataLakeGen2Datastore
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureDataLakeGen2Datastore(
    name="",
    description="",
    account_name="",
    filesystem=""
)

ml_client.create_or_update(store)

Create an Azure Files datastore

from azure.ai.ml.entities import AzureFileDatastore
from azure.ai.ml.entities import AccountKeyConfiguration
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureFileDatastore(
    name="file_example",
    description="Datastore pointing to an Azure File Share.",
    account_name="mytestfilestore",
    file_share_name="my-share",
    credentials=AccountKeyConfiguration(
        account_key= "XXXxxxXXXxXXXXxxXXXXXxXXXXXxXxxXxXXXxXXXxXXxxxXXxxXXXxXxXXXxxXxxXXXXxxxxxXXxxxxxxXXXxXXX"
    ),
)

ml_client.create_or_update(store)

Create an Azure Data Lake Gen1 datastore

from azure.ai.ml.entities import AzureDataLakeGen1Datastore
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureDataLakeGen1Datastore(
    name="",
    store_name="",
    description="",
)

ml_client.create_or_update(store)

Create a OneLake (Microsoft Fabric) datastore (preview)

This section describes various options to create a OneLake datastore. The OneLake datastore is part of Microsoft Fabric. At this time, Azure Machine Learning supports connection to Microsoft Fabric Lakehouse artifacts that include folders / files and Amazon S3 shortcuts. For more information about Lakehouse, visit What is a lakehouse in Microsoft Fabric.

OneLake datastore creation requires

  • Endpoint
  • Fabric workspace name or GUID
  • Artifact name or GUID

information from your Microsoft Fabric instance. These three screenshots describe retrieval of these required information resources from your Microsoft Fabric instance:

OneLake workspace name

In your Microsoft Fabric instance, you can find the workspace information as shown in this screenshot. You can use either a GUID value, or a "friendly name" to create an Azure Machine Learning OneLake datastore.

Screenshot that shows Fabric Workspace details in Microsoft Fabric UI.

OneLake endpoint

This screenshot shows how you can find endpoint information in your Microsoft Fabric instance:

Screenshot that shows Fabric endpoint details in Microsoft Fabric UI.

OneLake artifact name

This screenshot shows how you can find the artifact information in your Microsoft Fabric instance. The screenshot also shows how you can either use a GUID value or a "friendly name" to create an Azure Machine Learning OneLake datastore:

Screenshot showing how to get Fabric LH artifact details in Microsoft Fabric UI.

Create a OneLake datastore

from azure.ai.ml.entities import OneLakeDatastore, OneLakeArtifact
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = OneLakeDatastore(
    name="onelake_example_id",
    description="Datastore pointing to an Microsoft fabric artifact.",
    one_lake_workspace_name="AzureML_Sample_OneLakeWS",
    endpoint="msit-onelake.dfs.fabric.microsoft.com"
    artifact = OneLakeArtifact(
        name="AzML_Sample_LH",
        type="lake_house"
    )
)

ml_client.create_or_update(store)

Next steps