Datastore Class

Reference

Represents a storage abstraction over an Azure Machine Learning storage account.

Datastores are attached to workspaces and are used to store connection information to Azure storage services so you can refer to them by name and don't need to remember the connection information and secret used to connect to the storage services.

Examples of supported Azure storage services that can be registered as datastores are:

Azure Blob Container
Azure File Share
Azure Data Lake
Azure Data Lake Gen2
Azure SQL Database
Azure Database for PostgreSQL
Databricks File System
Azure Database for MySQL

Use this class to perform management operations, including register, list, get, and remove datastores. Datastores for each service are created with the register* methods of this class. When using a datastore to access data, you must have permission to access that data, which depends on the credentials registered with the datastore.

For more information on datastores and how they can be used in machine learning see the following articles:

Get a datastore by name. This call will make a request to the datastore service.

Inheritance: builtins.object

Datastore

Constructor

Datastore(workspace, name=None)

Parameters

Name	Description
workspace Required	Workspace The workspace.
name	str, <xref:optional> The name of the datastore, defaults to None, which gets the default datastore. Default value: None

Remarks

To interact with data in your datastores for machine learning tasks, like training, create an Azure Machine Learning dataset. Datasets provide functions that load tabular data into a pandas or Spark DataFrame. Datasets also provide the ability to download or mount files of any format from Azure Blob storage, Azure Files, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, Azure SQL Database, and Azure Database for PostgreSQL. Learn more about how to train with datasets.

The following example shows how to create a Datastore connected to Azure Blob Container.


   # from azureml.exceptions import UserErrorException
   #
   # blob_datastore_name='MyBlobDatastore'
   # account_name=os.getenv("BLOB_ACCOUNTNAME_62", "<my-account-name>") # Storage account name
   # container_name=os.getenv("BLOB_CONTAINER_62", "<my-container-name>") # Name of Azure blob container
   # account_key=os.getenv("BLOB_ACCOUNT_KEY_62", "<my-account-key>") # Storage account key
   #
   # try:
   #     blob_datastore = Datastore.get(ws, blob_datastore_name)
   #     print("Found Blob Datastore with name: %s" % blob_datastore_name)
   # except UserErrorException:
   #     blob_datastore = Datastore.register_azure_blob_container(
   #         workspace=ws,
   #         datastore_name=blob_datastore_name,
   #         account_name=account_name, # Storage account name
   #         container_name=container_name, # Name of Azure blob container
   #         account_key=account_key) # Storage account key
   #     print("Registered blob datastore with name: %s" % blob_datastore_name)
   #
   # blob_data_ref = DataReference(
   #     datastore=blob_datastore,
   #     data_reference_name="blob_test_data",
   #     path_on_datastore="testdata")

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.ipynb

Methods

get	Get a datastore by name. This is same as calling the constructor.
get_default	Get the default datastore for the workspace.
register_azure_blob_container	Register an Azure Blob Container to the datastore. Credential based (GA) and identity based (Preview) data access are supported, you can choose to use SAS Token or Storage Account Key. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.
register_azure_data_lake	Initialize a new Azure Data Lake Datastore. Credential based (GA) and identity based (Preview) data access are supported, You can register a datastore with Service Principal for credential based data access. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here. Please see below for an example of how to register an Azure Data Lake Gen1 as a Datastore. adlsgen1_datastore_name='adlsgen1datastore' store_name=os.getenv("ADL_STORENAME", "<my_datastore_name>") # the ADLS name subscription_id=os.getenv("ADL_SUBSCRIPTION", "<my_subscription_id>") # subscription id of the ADLS resource_group=os.getenv("ADL_RESOURCE_GROUP", "<my_resource_group>") # resource group of ADLS tenant_id=os.getenv("ADL_TENANT", "<my_tenant_id>") # tenant id of service principal client_id=os.getenv("ADL_CLIENTID", "<my_client_id>") # client id of service principal client_secret=os.getenv("ADL_CLIENT_SECRET", "<my_client_secret>") # the secret of service principal adls_datastore = Datastore.register_azure_data_lake( workspace=ws, datastore_name=aslsgen1_datastore_name, subscription_id=subscription_id, # subscription id of ADLS account resource_group=resource_group, # resource group of ADLS account store_name=store_name, # ADLS account name tenant_id=tenant_id, # tenant id of service principal client_id=client_id, # client id of service principal client_secret=client_secret) # the secret of service principal
register_azure_data_lake_gen2	Initialize a new Azure Data Lake Gen2 Datastore. Credential based (GA) and identity based (Preview) data access are supported, You can register a datastore with Service Principal for credential based data access. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.
register_azure_file_share	Register an Azure File Share to the datastore. You can choose to use SAS Token or Storage Account Key
register_azure_my_sql	Initialize a new Azure MySQL Datastore. MySQL datastore can only be used to create DataReference as input and output to DataTransferStep in Azure Machine Learning pipelines. More details can be found here. Please see below for an example of how to register an Azure MySQL database as a Datastore.
register_azure_postgre_sql	Initialize a new Azure PostgreSQL Datastore. Please see below for an example of how to register an Azure PostgreSQL database as a Datastore.
register_azure_sql_database	Initialize a new Azure SQL database Datastore. Credential based (GA) and identity based (Preview) data access are supported, you can choose to use Service Principal or username + password. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here. Please see below for an example of how to register an Azure SQL database as a Datastore.
register_dbfs	Initialize a new Databricks File System (DBFS) datastore. The DBFS datastore can only be used to create DataReference as input and PipelineData as output to DatabricksStep in Azure Machine Learning pipelines. More details can be found here..
register_hdfs	Note This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information. Initialize a new HDFS datastore.
set_as_default	Set the default datastore.
unregister	Unregisters the datastore. the underlying storage service will not be deleted.

get

Get a datastore by name. This is same as calling the constructor.

static get(workspace, datastore_name)

Parameters

Name	Description
workspace Required	Workspace The workspace.
datastore_name Required	str, <xref:optional> The name of the datastore, defaults to None, which gets the default datastore.

Returns

Type	Description
AzureFileDatastore, AzureBlobDatastore, AzureDataLakeDatastore, AzureDataLakeGen2Datastore, AzureSqlDatabaseDatastore, AzurePostgreSqlDatastore, AzureMySqlDatastore, DBFSDatastore	The corresponding datastore for that name.

get_default

Get the default datastore for the workspace.

static get_default(workspace)

Parameters

Name	Description
workspace Required	Workspace The workspace.

Returns

Type	Description
AzureFileDatastore, AzureBlobDatastore	The default datastore for the workspace

register_azure_blob_container

Credential based (GA) and identity based (Preview) data access are supported, you can choose to use SAS Token or Storage Account Key. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

static register_azure_blob_container(workspace, datastore_name, container_name, account_name, sas_token=None, account_key=None, protocol=None, endpoint=None, overwrite=False, create_if_not_exists=False, skip_validation=False, blob_cache_timeout=None, grant_workspace_access=False, subscription_id=None, resource_group=None)

Parameters

Name	Description
workspace Required	Workspace The workspace.
datastore_name Required	str The name of the datastore, case insensitive, can only contain alphanumeric characters and _.
container_name Required	str The name of the azure blob container.
account_name Required	str The storage account name.
sas_token	str, <xref:optional> An account SAS token, defaults to None. For data read, we require a minimum of List & Read permissions for Containers & Objects and for data write we additionally require Write & Add permissions. Default value: None
account_key	str, <xref:optional> Access keys of your storage account, defaults to None. Default value: None
protocol	str, <xref:optional> Protocol to use to connect to the blob container. If None, defaults to https. Default value: None
endpoint	str, <xref:optional> The endpoint of the storage account. If None, defaults to core.windows.net. Default value: None
overwrite	bool, <xref:optional> overwrites an existing datastore. If the datastore does not exist, it will create one, defaults to False Default value: False
create_if_not_exists	bool, <xref:optional> create the blob container if it does not exists, defaults to False Default value: False
skip_validation	bool, <xref:optional> skips validation of storage keys, defaults to False Default value: False
blob_cache_timeout	int, <xref:optional> When this blob is mounted, set the cache timeout to this many seconds. If None, defaults to no timeout (i.e. blobs will be cached for the duration of the job when read). Default value: None
grant_workspace_access	bool, <xref:optional> Defaults to False. Set it to True to access data behind virtual network from Machine Learning Studio.This makes data access from Machine Learning Studio use workspace managed identity for authentication, and adds the workspace managed identity as Reader of the storage. You have to be owner or user access administrator of the storage to opt-in. Ask your administrator to configure it for you if you do not have the required permission. Learn more 'https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network' Default value: False
subscription_id	str, <xref:optional> The subscription id of the storage account, defaults to None. Default value: None
resource_group	str, <xref:optional> The resource group of the storage account, defaults to None. Default value: None

Returns

Type	Description
AzureBlobDatastore	The blob datastore.

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

register_azure_data_lake

Initialize a new Azure Data Lake Datastore.

Credential based (GA) and identity based (Preview) data access are supported, You can register a datastore with Service Principal for credential based data access. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

Please see below for an example of how to register an Azure Data Lake Gen1 as a Datastore.


   adlsgen1_datastore_name='adlsgen1datastore'

   store_name=os.getenv("ADL_STORENAME", "<my_datastore_name>") # the ADLS name
   subscription_id=os.getenv("ADL_SUBSCRIPTION", "<my_subscription_id>") # subscription id of the ADLS
   resource_group=os.getenv("ADL_RESOURCE_GROUP", "<my_resource_group>") # resource group of ADLS
   tenant_id=os.getenv("ADL_TENANT", "<my_tenant_id>") # tenant id of service principal
   client_id=os.getenv("ADL_CLIENTID", "<my_client_id>") # client id of service principal
   client_secret=os.getenv("ADL_CLIENT_SECRET", "<my_client_secret>") # the secret of service principal

   adls_datastore = Datastore.register_azure_data_lake(
       workspace=ws,
       datastore_name=aslsgen1_datastore_name,
       subscription_id=subscription_id, # subscription id of ADLS account
       resource_group=resource_group, # resource group of ADLS account
       store_name=store_name, # ADLS account name
       tenant_id=tenant_id, # tenant id of service principal
       client_id=client_id, # client id of service principal
       client_secret=client_secret) # the secret of service principal

static register_azure_data_lake(workspace, datastore_name, store_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, subscription_id=None, resource_group=None, overwrite=False, grant_workspace_access=False)

Parameters

Name	Description
workspace Required	Workspace The workspace this datastore belongs to.
datastore_name Required	str The datastore name.
store_name Required	str The ADLS store name.
tenant_id	str, <xref:optional> The Directory ID/Tenant ID of the service principal used to access data. Default value: None
client_id	str, <xref:optional> The Client ID/Application ID of the service principal used to access data. Default value: None
client_secret	str, <xref:optional> The Client Secret of the service principal used to access data. Default value: None
resource_url	str, <xref:optional> The resource URL, which determines what operations will be performed on the Data Lake store, if None, defaults to `https://datalake.azure.net/` which allows us to perform filesystem operations. Default value: None
authority_url	str, <xref:optional> The authority URL used to authenticate the user, defaults to `https://login.microsoftonline.com`. Default value: None
subscription_id	str, <xref:optional> The ID of the subscription the ADLS store belongs to. Default value: None
resource_group	str, <xref:optional> The resource group the ADLS store belongs to. Default value: None
overwrite	bool, <xref:optional> Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False. Default value: False
grant_workspace_access	bool, <xref:optional> Defaults to False. Set it to True to access data behind virtual network from Machine Learning Studio.This makes data access from Machine Learning Studio use workspace managed identity for authentication, and adds the workspace managed identity as Reader of the storage. You have to be Owner or User Access Administrator of the storage to opt-in. Ask your administrator to configure it for you if you do not have the required permission. Learn more 'https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network' Default value: False

Returns

Type	Description
AzureDataLakeDatastore	Returns the Azure Data Lake Datastore.

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

Note

Azure Data Lake Datastore supports data transfer and running U-Sql jobs using Azure Machine Learning Pipelines.

You can also use it as a data source for Azure Machine Learning Dataset which can be downloaded or mounted on any supported compute.

register_azure_data_lake_gen2

Initialize a new Azure Data Lake Gen2 Datastore.

static register_azure_data_lake_gen2(workspace, datastore_name, filesystem, account_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, protocol=None, endpoint=None, overwrite=False, subscription_id=None, resource_group=None, grant_workspace_access=False)

Parameters

Name	Description
workspace Required	Workspace The workspace this datastore belongs to.
datastore_name Required	str The datastore name.
filesystem Required	str The name of the Data Lake Gen2 filesystem.
account_name Required	str The storage account name.
tenant_id	str, <xref:optional> The Directory ID/Tenant ID of the service principal. Default value: None
client_id	str, <xref:optional> The Client ID/Application ID of the service principal. Default value: None
client_secret	str, <xref:optional> The secret of the service principal. Default value: None
resource_url	str, <xref:optional> The resource URL, which determines what operations will be performed on the data lake store, defaults to `https://storage.azure.com/` which allows us to perform filesystem operations. Default value: None
authority_url	str, <xref:optional> The authority URL used to authenticate the user, defaults to `https://login.microsoftonline.com`. Default value: None
protocol	str, <xref:optional> Protocol to use to connect to the blob container. If None, defaults to https. Default value: None
endpoint	str, <xref:optional> The endpoint of the storage account. If None, defaults to core.windows.net. Default value: None
overwrite	bool, <xref:optional> Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False. Default value: False
subscription_id	str, <xref:optional> The ID of the subscription the ADLS store belongs to. Default value: None
resource_group	str, <xref:optional> The resource group the ADLS store belongs to. Default value: None
grant_workspace_access	bool, <xref:optional> Defaults to False. Set it to True to access data behind virtual network from Machine Learning Studio.This makes data access from Machine Learning Studio use workspace managed identity for authentication, and adds the workspace managed identity as Reader of the storage. You have to be owner or user access administrator of the storage to opt-in. Ask your administrator to configure it for you if you do not have the required permission. Learn more 'https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network' Default value: False

Returns

Type	Description
AzureDataLakeGen2Datastore	Returns the Azure Data Lake Gen2 Datastore.

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

You can choose to use SAS Token or Storage Account Key

static register_azure_file_share(workspace, datastore_name, file_share_name, account_name, sas_token=None, account_key=None, protocol=None, endpoint=None, overwrite=False, create_if_not_exists=False, skip_validation=False)

Parameters

Name	Description
workspace Required	Workspace The workspace this datastore belongs to.
datastore_name Required	str The name of the datastore, case insensitive, can only contain alphanumeric characters and _.
file_share_name Required	str The name of the azure file container.
account_name Required	str The storage account name.
sas_token	str, <xref:optional> An account SAS token, defaults to None. For data read, we require a minimum of List & Read permissions for Containers & Objects and for data write we additionally require Write & Add permissions. Default value: None
account_key	str, <xref:optional> Access keys of your storage account, defaults to None. Default value: None
protocol	str, <xref:optional> The protocol to use to connect to the file share. If None, defaults to https. Default value: None
endpoint	str, <xref:optional> The endpoint of the file share. If None, defaults to core.windows.net. Default value: None
overwrite	bool, <xref:optional> Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False. Default value: False
create_if_not_exists	bool, <xref:optional> Whether to create the file share if it does not exists. The default is False. Default value: False
skip_validation	bool, <xref:optional> Whether to skip validation of storage keys. The default is False. Default value: False

Returns

Type	Description
AzureFileDatastore	The file datastore.

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

register_azure_my_sql

Initialize a new Azure MySQL Datastore.

MySQL datastore can only be used to create DataReference as input and output to DataTransferStep in Azure Machine Learning pipelines. More details can be found here.

Please see below for an example of how to register an Azure MySQL database as a Datastore.

static register_azure_my_sql(workspace, datastore_name, server_name, database_name, user_id, user_password, port_number=None, endpoint=None, overwrite=False, **kwargs)

Parameters

Name	Description
workspace Required	Workspace The workspace this datastore belongs to.
datastore_name Required	str The datastore name.
server_name Required	str The MySQL server name.
database_name Required	str The MySQL database name.
user_id Required	str The User ID of the MySQL server.
user_password Required	str The user password of the MySQL server.
port_number	str The port number of the MySQL server. Default value: None
endpoint	str, <xref:optional> The endpoint of the MySQL server. If None, defaults to mysql.database.azure.com. Default value: None
overwrite	bool, <xref:optional> Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False. Default value: False

Returns

Type	Description
AzureMySqlDatastore	Returns the MySQL database Datastore.

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.


   mysql_datastore_name="mysqldatastore"
   server_name=os.getenv("MYSQL_SERVERNAME", "<my_server_name>") # FQDN name of the MySQL server
   database_name=os.getenv("MYSQL_DATBASENAME", "<my_database_name>") # Name of the MySQL database
   user_id=os.getenv("MYSQL_USERID", "<my_user_id>") # The User ID of the MySQL server
   user_password=os.getenv("MYSQL_USERPW", "<my_user_password>") # The user password of the MySQL server.

   mysql_datastore = Datastore.register_azure_my_sql(
       workspace=ws,
       datastore_name=mysql_datastore_name,
       server_name=server_name,
       database_name=database_name,
       user_id=user_id,
       user_password=user_password)

register_azure_postgre_sql

Initialize a new Azure PostgreSQL Datastore.

Please see below for an example of how to register an Azure PostgreSQL database as a Datastore.

static register_azure_postgre_sql(workspace, datastore_name, server_name, database_name, user_id, user_password, port_number=None, endpoint=None, overwrite=False, enforce_ssl=True, **kwargs)

Parameters

Name	Description
workspace Required	Workspace The workspace this datastore belongs to.
datastore_name Required	str The datastore name.
server_name Required	str The PostgreSQL server name.
database_name Required	str The PostgreSQL database name.
user_id Required	str The User ID of the PostgreSQL server.
user_password Required	str The User Password of the PostgreSQL server.
port_number	str The Port Number of the PostgreSQL server Default value: None
endpoint	str, <xref:optional> The endpoint of the PostgreSQL server. If None, defaults to postgres.database.azure.com. Default value: None
overwrite	bool, <xref:optional> Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False. Default value: False
enforce_ssl	bool Indicates SSL requirement of PostgreSQL server. Defaults to True. Default value: True

Returns

Type	Description
AzurePostgreSqlDatastore	Returns the PostgreSQL database Datastore.

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.


   psql_datastore_name="postgresqldatastore"
   server_name=os.getenv("PSQL_SERVERNAME", "<my_server_name>") # FQDN name of the PostgreSQL server
   database_name=os.getenv("PSQL_DATBASENAME", "<my_database_name>") # Name of the PostgreSQL database
   user_id=os.getenv("PSQL_USERID", "<my_user_id>") # The database user id
   user_password=os.getenv("PSQL_USERPW", "<my_user_password>") # The database user password

   psql_datastore = Datastore.register_azure_postgre_sql(
       workspace=ws,
       datastore_name=psql_datastore_name,
       server_name=server_name,
       database_name=database_name,
       user_id=user_id,
       user_password=user_password)

register_azure_sql_database

Initialize a new Azure SQL database Datastore.

Credential based (GA) and identity based (Preview) data access are supported, you can choose to use Service Principal or username + password. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

Please see below for an example of how to register an Azure SQL database as a Datastore.

static register_azure_sql_database(workspace, datastore_name, server_name, database_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, endpoint=None, overwrite=False, username=None, password=None, subscription_id=None, resource_group=None, grant_workspace_access=False, **kwargs)

Parameters

Name	Description
workspace Required	Workspace The workspace this datastore belongs to.
datastore_name Required	str The datastore name.
server_name Required	str The SQL server name. For fully qualified domain name like "sample.database.windows.net", the server_name value should be "sample", and the endpoint value should be "database.windows.net".
database_name Required	str The SQL database name.
tenant_id	str The Directory ID/Tenant ID of the service principal. Default value: None
client_id	str The Client ID/Application ID of the service principal. Default value: None
client_secret	str The secret of the service principal. Default value: None
resource_url	str, <xref:optional> The resource URL, which determines what operations will be performed on the SQL database store, if None, defaults to https://database.windows.net/. Default value: None
authority_url	str, <xref:optional> The authority URL used to authenticate the user, defaults to https://login.microsoftonline.com. Default value: None
endpoint	str, <xref:optional> The endpoint of the SQL server. If None, defaults to database.windows.net. Default value: None
overwrite	bool, <xref:optional> Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False. Default value: False
username	str The username of the database user to access the database. Default value: None
password	str The password of the database user to access the database. Default value: None
skip_validation Required	bool, <xref:optional> Whether to skip validation of connecting to the SQL database. Defaults to False.
subscription_id	str, <xref:optional> The ID of the subscription the ADLS store belongs to. Default value: None
resource_group	str, <xref:optional> The resource group the ADLS store belongs to. Default value: None
grant_workspace_access	bool, <xref:optional> Defaults to False. Set it to True to access data behind virtual network from Machine Learning Studio.This makes data access from Machine Learning Studio use workspace managed identity for authentication, and adds the workspace managed identity as Reader of the storage. You have to be owner or user access administrator of the storage to opt-in. Ask your administrator to configure it for you if you do not have the required permission. Learn more 'https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network' Default value: False

Returns

Type	Description
AzureSqlDatabaseDatastore	Returns the SQL database Datastore.

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.


   sql_datastore_name="azuresqldatastore"
   server_name=os.getenv("SQL_SERVERNAME", "<my_server_name>") # Name of the Azure SQL server
   database_name=os.getenv("SQL_DATABASENAME", "<my_database_name>") # Name of the Azure SQL database
   username=os.getenv("SQL_USER_NAME", "<my_sql_user_name>") # The username of the database user.
   password=os.getenv("SQL_USER_PASSWORD", "<my_sql_user_password>") # The password of the database user.

   sql_datastore = Datastore.register_azure_sql_database(
       workspace=ws,
       datastore_name=sql_datastore_name,
       server_name=server_name,  # name should not contain fully qualified domain endpoint
       database_name=database_name,
       username=username,
       password=password,
       endpoint='database.windows.net')

register_dbfs

Initialize a new Databricks File System (DBFS) datastore.

The DBFS datastore can only be used to create DataReference as input and PipelineData as output to DatabricksStep in Azure Machine Learning pipelines. More details can be found here..

static register_dbfs(workspace, datastore_name)

Parameters

Name	Description
workspace Required	Workspace The workspace this datastore belongs to.
datastore_name Required	str The datastore name.

Returns

Type	Description
DBFSDatastore	Returns the DBFS Datastore.

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

register_hdfs

Note

This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.

Initialize a new HDFS datastore.

static register_hdfs(workspace, datastore_name, protocol, namenode_address, hdfs_server_certificate, kerberos_realm, kerberos_kdc_address, kerberos_principal, kerberos_keytab=None, kerberos_password=None, overwrite=False)

Parameters

Name	Description
workspace Required	Workspace the workspace this datastore belongs to
datastore_name Required	str the datastore name
protocol Required	str or <xref:_restclient.models.enum> The protocol to use when communicating with the HDFS cluster. http or https. Possible values include: 'http', 'https'
namenode_address Required	str The IP address or DNS hostname of the HDFS namenode. Optionally includes a port.
hdfs_server_certificate Required	str, <xref:optional> The path to the TLS signing certificate of the HDFS namenode, if using TLS with a self-signed cert.
kerberos_realm Required	str The Kerberos realm.
kerberos_kdc_address Required	str The IP address or DNS hostname of the Kerberos KDC.
kerberos_principal Required	str The Kerberos principal to use for authentication and authorization.
kerberos_keytab Required	str, <xref:optional> The path to the keytab file containing the key(s) corresponding to the Kerberos principal. Provide either this, or a password.
kerberos_password Required	str, <xref:optional> The password corresponding to the Kerberos principal. Provide either this, or the path to a keytab file.
overwrite Required	bool, <xref:optional> overwrites an existing datastore. If the datastore does not exist, it will create one. Defaults to False.

set_as_default

Set the default datastore.

set_as_default()

Parameters

Name	Description
datastore_name Required	str The name of the datastore.

unregister

Unregisters the datastore. the underlying storage service will not be deleted.

unregister()

Share via

Datastore Class

Constructor

Parameters

Remarks

Methods

get

Parameters

Returns

get_default

Parameters

Returns

register_azure_blob_container

Parameters

Returns

Remarks

register_azure_data_lake

Parameters

Returns

Remarks

register_azure_data_lake_gen2

Parameters

Returns

Remarks

register_azure_file_share

Parameters

Returns

Remarks

register_azure_my_sql

Parameters

Returns

Remarks

register_azure_postgre_sql

Parameters

Returns

Remarks

register_azure_sql_database

Parameters

Returns

Remarks

register_dbfs

Parameters

Returns

Remarks

register_hdfs

Parameters

set_as_default

Parameters

unregister

Feedback

Additional resources