Datastore Class
Represents a storage abstraction over an Azure Machine Learning storage account.
Datastores are attached to workspaces and are used to store connection information to Azure storage services so you can refer to them by name and don't need to remember the connection information and secret used to connect to the storage services.
Examples of supported Azure storage services that can be registered as datastores are:
Azure Blob Container
Azure File Share
Azure Data Lake
Azure Data Lake Gen2
Azure SQL Database
Azure Database for PostgreSQL
Databricks File System
Azure Database for MySQL
Use this class to perform management operations, including register, list, get, and remove datastores.
Datastores for each service are created with the register*
methods of this class. When using a datastore
to access data, you must have permission to access that data, which depends on the credentials registered
with the datastore.
For more information on datastores and how they can be used in machine learning see the following articles:
Get a datastore by name. This call will make a request to the datastore service.
- Inheritance
-
builtins.objectDatastore
Constructor
Datastore(workspace, name=None)
Parameters
Name | Description |
---|---|
workspace
Required
|
The workspace. |
name
|
str, <xref:optional>
The name of the datastore, defaults to None, which gets the default datastore. Default value: None
|
Remarks
To interact with data in your datastores for machine learning tasks, like training, create an Azure Machine Learning dataset. Datasets provide functions that load tabular data into a pandas or Spark DataFrame. Datasets also provide the ability to download or mount files of any format from Azure Blob storage, Azure Files, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, Azure SQL Database, and Azure Database for PostgreSQL. Learn more about how to train with datasets.
The following example shows how to create a Datastore connected to Azure Blob Container.
# from azureml.exceptions import UserErrorException
#
# blob_datastore_name='MyBlobDatastore'
# account_name=os.getenv("BLOB_ACCOUNTNAME_62", "<my-account-name>") # Storage account name
# container_name=os.getenv("BLOB_CONTAINER_62", "<my-container-name>") # Name of Azure blob container
# account_key=os.getenv("BLOB_ACCOUNT_KEY_62", "<my-account-key>") # Storage account key
#
# try:
# blob_datastore = Datastore.get(ws, blob_datastore_name)
# print("Found Blob Datastore with name: %s" % blob_datastore_name)
# except UserErrorException:
# blob_datastore = Datastore.register_azure_blob_container(
# workspace=ws,
# datastore_name=blob_datastore_name,
# account_name=account_name, # Storage account name
# container_name=container_name, # Name of Azure blob container
# account_key=account_key) # Storage account key
# print("Registered blob datastore with name: %s" % blob_datastore_name)
#
# blob_data_ref = DataReference(
# datastore=blob_datastore,
# data_reference_name="blob_test_data",
# path_on_datastore="testdata")
Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.ipynb
Methods
get |
Get a datastore by name. This is same as calling the constructor. |
get_default |
Get the default datastore for the workspace. |
register_azure_blob_container |
Register an Azure Blob Container to the datastore. Credential based (GA) and identity based (Preview) data access are supported, you can choose to use SAS Token or Storage Account Key. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here. |
register_azure_data_lake |
Initialize a new Azure Data Lake Datastore. Credential based (GA) and identity based (Preview) data access are supported, You can register a datastore with Service Principal for credential based data access. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here. Please see below for an example of how to register an Azure Data Lake Gen1 as a Datastore.
|
register_azure_data_lake_gen2 |
Initialize a new Azure Data Lake Gen2 Datastore. Credential based (GA) and identity based (Preview) data access are supported, You can register a datastore with Service Principal for credential based data access. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here. |
register_azure_file_share |
Register an Azure File Share to the datastore. You can choose to use SAS Token or Storage Account Key |
register_azure_my_sql |
Initialize a new Azure MySQL Datastore. MySQL datastore can only be used to create DataReference as input and output to DataTransferStep in Azure Machine Learning pipelines. More details can be found here. Please see below for an example of how to register an Azure MySQL database as a Datastore. |
register_azure_postgre_sql |
Initialize a new Azure PostgreSQL Datastore. Please see below for an example of how to register an Azure PostgreSQL database as a Datastore. |
register_azure_sql_database |
Initialize a new Azure SQL database Datastore. Credential based (GA) and identity based (Preview) data access are supported, you can choose to use Service Principal or username + password. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here. Please see below for an example of how to register an Azure SQL database as a Datastore. |
register_dbfs |
Initialize a new Databricks File System (DBFS) datastore. The DBFS datastore can only be used to create DataReference as input and PipelineData as output to DatabricksStep in Azure Machine Learning pipelines. More details can be found here.. |
register_hdfs |
Note This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information. Initialize a new HDFS datastore. |
set_as_default |
Set the default datastore. |
unregister |
Unregisters the datastore. the underlying storage service will not be deleted. |
get
Get a datastore by name. This is same as calling the constructor.
static get(workspace, datastore_name)
Parameters
Name | Description |
---|---|
workspace
Required
|
The workspace. |
datastore_name
Required
|
str, <xref:optional>
The name of the datastore, defaults to None, which gets the default datastore. |
Returns
Type | Description |
---|---|
The corresponding datastore for that name. |
get_default
Get the default datastore for the workspace.
static get_default(workspace)
Parameters
Name | Description |
---|---|
workspace
Required
|
The workspace. |
Returns
Type | Description |
---|---|
The default datastore for the workspace |
register_azure_blob_container
Register an Azure Blob Container to the datastore.
Credential based (GA) and identity based (Preview) data access are supported, you can choose to use SAS Token or Storage Account Key. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.
static register_azure_blob_container(workspace, datastore_name, container_name, account_name, sas_token=None, account_key=None, protocol=None, endpoint=None, overwrite=False, create_if_not_exists=False, skip_validation=False, blob_cache_timeout=None, grant_workspace_access=False, subscription_id=None, resource_group=None)
Parameters
Name | Description |
---|---|
workspace
Required
|
The workspace. |
datastore_name
Required
|
The name of the datastore, case insensitive, can only contain alphanumeric characters and _. |
container_name
Required
|
The name of the azure blob container. |
account_name
Required
|
The storage account name. |
sas_token
|
str, <xref:optional>
An account SAS token, defaults to None. For data read, we require a minimum of List & Read permissions for Containers & Objects and for data write we additionally require Write & Add permissions. Default value: None
|
account_key
|
str, <xref:optional>
Access keys of your storage account, defaults to None. Default value: None
|
protocol
|
str, <xref:optional>
Protocol to use to connect to the blob container. If None, defaults to https. Default value: None
|
endpoint
|
str, <xref:optional>
The endpoint of the storage account. If None, defaults to core.windows.net. Default value: None
|
overwrite
|
bool, <xref:optional>
overwrites an existing datastore. If the datastore does not exist, it will create one, defaults to False Default value: False
|
create_if_not_exists
|
bool, <xref:optional>
create the blob container if it does not exists, defaults to False Default value: False
|
skip_validation
|
bool, <xref:optional>
skips validation of storage keys, defaults to False Default value: False
|
blob_cache_timeout
|
int, <xref:optional>
When this blob is mounted, set the cache timeout to this many seconds. If None, defaults to no timeout (i.e. blobs will be cached for the duration of the job when read). Default value: None
|
grant_workspace_access
|
bool, <xref:optional>
Defaults to False. Set it to True to access data behind virtual network from Machine Learning Studio.This makes data access from Machine Learning Studio use workspace managed identity for authentication, and adds the workspace managed identity as Reader of the storage. You have to be owner or user access administrator of the storage to opt-in. Ask your administrator to configure it for you if you do not have the required permission. Learn more 'https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network' Default value: False
|
subscription_id
|
str, <xref:optional>
The subscription id of the storage account, defaults to None. Default value: None
|
resource_group
|
str, <xref:optional>
The resource group of the storage account, defaults to None. Default value: None
|
Returns
Type | Description |
---|---|
The blob datastore. |
Remarks
If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.
register_azure_data_lake
Initialize a new Azure Data Lake Datastore.
Credential based (GA) and identity based (Preview) data access are supported, You can register a datastore with Service Principal for credential based data access. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.
Please see below for an example of how to register an Azure Data Lake Gen1 as a Datastore.
adlsgen1_datastore_name='adlsgen1datastore'
store_name=os.getenv("ADL_STORENAME", "<my_datastore_name>") # the ADLS name
subscription_id=os.getenv("ADL_SUBSCRIPTION", "<my_subscription_id>") # subscription id of the ADLS
resource_group=os.getenv("ADL_RESOURCE_GROUP", "<my_resource_group>") # resource group of ADLS
tenant_id=os.getenv("ADL_TENANT", "<my_tenant_id>") # tenant id of service principal
client_id=os.getenv("ADL_CLIENTID", "<my_client_id>") # client id of service principal
client_secret=os.getenv("ADL_CLIENT_SECRET", "<my_client_secret>") # the secret of service principal
adls_datastore = Datastore.register_azure_data_lake(
workspace=ws,
datastore_name=aslsgen1_datastore_name,
subscription_id=subscription_id, # subscription id of ADLS account
resource_group=resource_group, # resource group of ADLS account
store_name=store_name, # ADLS account name
tenant_id=tenant_id, # tenant id of service principal
client_id=client_id, # client id of service principal
client_secret=client_secret) # the secret of service principal
static register_azure_data_lake(workspace, datastore_name, store_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, subscription_id=None, resource_group=None, overwrite=False, grant_workspace_access=False)
Parameters
Name | Description |
---|---|
workspace
Required
|
The workspace this datastore belongs to. |
datastore_name
Required
|
The datastore name. |
store_name
Required
|
The ADLS store name. |
tenant_id
|
str, <xref:optional>
The Directory ID/Tenant ID of the service principal used to access data. Default value: None
|
client_id
|
str, <xref:optional>
The Client ID/Application ID of the service principal used to access data. Default value: None
|
client_secret
|
str, <xref:optional>
The Client Secret of the service principal used to access data. Default value: None
|
resource_url
|
str, <xref:optional>
The resource URL, which determines what operations will be performed on the Data Lake
store, if None, defaults to Default value: None
|
authority_url
|
str, <xref:optional>
The authority URL used to authenticate the user, defaults to
Default value: None
|
subscription_id
|
str, <xref:optional>
The ID of the subscription the ADLS store belongs to. Default value: None
|
resource_group
|
str, <xref:optional>
The resource group the ADLS store belongs to. Default value: None
|
overwrite
|
bool, <xref:optional>
Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False. Default value: False
|
grant_workspace_access
|
bool, <xref:optional>
Defaults to False. Set it to True to access data behind virtual network from Machine Learning Studio.This makes data access from Machine Learning Studio use workspace managed identity for authentication, and adds the workspace managed identity as Reader of the storage. You have to be Owner or User Access Administrator of the storage to opt-in. Ask your administrator to configure it for you if you do not have the required permission. Learn more 'https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network' Default value: False
|
Returns
Type | Description |
---|---|
Returns the Azure Data Lake Datastore. |
Remarks
If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.
Note
Azure Data Lake Datastore supports data transfer and running U-Sql jobs using Azure Machine Learning Pipelines.
You can also use it as a data source for Azure Machine Learning Dataset which can be downloaded or mounted on any supported compute.
register_azure_data_lake_gen2
Initialize a new Azure Data Lake Gen2 Datastore.
Credential based (GA) and identity based (Preview) data access are supported, You can register a datastore with Service Principal for credential based data access. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.
static register_azure_data_lake_gen2(workspace, datastore_name, filesystem, account_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, protocol=None, endpoint=None, overwrite=False, subscription_id=None, resource_group=None, grant_workspace_access=False)
Parameters
Name | Description |
---|---|
workspace
Required
|
The workspace this datastore belongs to. |
datastore_name
Required
|
The datastore name. |
filesystem
Required
|
The name of the Data Lake Gen2 filesystem. |
account_name
Required
|
The storage account name. |
tenant_id
|
str, <xref:optional>
The Directory ID/Tenant ID of the service principal. Default value: None
|
client_id
|
str, <xref:optional>
The Client ID/Application ID of the service principal. Default value: None
|
client_secret
|
str, <xref:optional>
The secret of the service principal. Default value: None
|
resource_url
|
str, <xref:optional>
The resource URL, which determines what operations will be performed on
the data lake store, defaults to Default value: None
|
authority_url
|
str, <xref:optional>
The authority URL used to authenticate the user, defaults to
Default value: None
|
protocol
|
str, <xref:optional>
Protocol to use to connect to the blob container. If None, defaults to https. Default value: None
|
endpoint
|
str, <xref:optional>
The endpoint of the storage account. If None, defaults to core.windows.net. Default value: None
|
overwrite
|
bool, <xref:optional>
Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False. Default value: False
|
subscription_id
|
str, <xref:optional>
The ID of the subscription the ADLS store belongs to. Default value: None
|
resource_group
|
str, <xref:optional>
The resource group the ADLS store belongs to. Default value: None
|
grant_workspace_access
|
bool, <xref:optional>
Defaults to False. Set it to True to access data behind virtual network from Machine Learning Studio.This makes data access from Machine Learning Studio use workspace managed identity for authentication, and adds the workspace managed identity as Reader of the storage. You have to be owner or user access administrator of the storage to opt-in. Ask your administrator to configure it for you if you do not have the required permission. Learn more 'https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network' Default value: False
|
Returns
Type | Description |
---|---|
Returns the Azure Data Lake Gen2 Datastore. |
Remarks
If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.
register_azure_file_share
Register an Azure File Share to the datastore.
You can choose to use SAS Token or Storage Account Key
static register_azure_file_share(workspace, datastore_name, file_share_name, account_name, sas_token=None, account_key=None, protocol=None, endpoint=None, overwrite=False, create_if_not_exists=False, skip_validation=False)
Parameters
Name | Description |
---|---|
workspace
Required
|
The workspace this datastore belongs to. |
datastore_name
Required
|
The name of the datastore, case insensitive, can only contain alphanumeric characters and _. |
file_share_name
Required
|
The name of the azure file container. |
account_name
Required
|
The storage account name. |
sas_token
|
str, <xref:optional>
An account SAS token, defaults to None. For data read, we require a minimum of List & Read permissions for Containers & Objects and for data write we additionally require Write & Add permissions. Default value: None
|
account_key
|
str, <xref:optional>
Access keys of your storage account, defaults to None. Default value: None
|
protocol
|
str, <xref:optional>
The protocol to use to connect to the file share. If None, defaults to https. Default value: None
|
endpoint
|
str, <xref:optional>
The endpoint of the file share. If None, defaults to core.windows.net. Default value: None
|
overwrite
|
bool, <xref:optional>
Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False. Default value: False
|
create_if_not_exists
|
bool, <xref:optional>
Whether to create the file share if it does not exists. The default is False. Default value: False
|
skip_validation
|
bool, <xref:optional>
Whether to skip validation of storage keys. The default is False. Default value: False
|
Returns
Type | Description |
---|---|
The file datastore. |
Remarks
If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.
register_azure_my_sql
Initialize a new Azure MySQL Datastore.
MySQL datastore can only be used to create DataReference as input and output to DataTransferStep in Azure Machine Learning pipelines. More details can be found here.
Please see below for an example of how to register an Azure MySQL database as a Datastore.
static register_azure_my_sql(workspace, datastore_name, server_name, database_name, user_id, user_password, port_number=None, endpoint=None, overwrite=False, **kwargs)
Parameters
Name | Description |
---|---|
workspace
Required
|
The workspace this datastore belongs to. |
datastore_name
Required
|
The datastore name. |
server_name
Required
|
The MySQL server name. |
database_name
Required
|
The MySQL database name. |
user_id
Required
|
The User ID of the MySQL server. |
user_password
Required
|
The user password of the MySQL server. |
port_number
|
The port number of the MySQL server. Default value: None
|
endpoint
|
str, <xref:optional>
The endpoint of the MySQL server. If None, defaults to mysql.database.azure.com. Default value: None
|
overwrite
|
bool, <xref:optional>
Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False. Default value: False
|
Returns
Type | Description |
---|---|
Returns the MySQL database Datastore. |
Remarks
If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.
mysql_datastore_name="mysqldatastore"
server_name=os.getenv("MYSQL_SERVERNAME", "<my_server_name>") # FQDN name of the MySQL server
database_name=os.getenv("MYSQL_DATBASENAME", "<my_database_name>") # Name of the MySQL database
user_id=os.getenv("MYSQL_USERID", "<my_user_id>") # The User ID of the MySQL server
user_password=os.getenv("MYSQL_USERPW", "<my_user_password>") # The user password of the MySQL server.
mysql_datastore = Datastore.register_azure_my_sql(
workspace=ws,
datastore_name=mysql_datastore_name,
server_name=server_name,
database_name=database_name,
user_id=user_id,
user_password=user_password)
register_azure_postgre_sql
Initialize a new Azure PostgreSQL Datastore.
Please see below for an example of how to register an Azure PostgreSQL database as a Datastore.
static register_azure_postgre_sql(workspace, datastore_name, server_name, database_name, user_id, user_password, port_number=None, endpoint=None, overwrite=False, enforce_ssl=True, **kwargs)
Parameters
Name | Description |
---|---|
workspace
Required
|
The workspace this datastore belongs to. |
datastore_name
Required
|
The datastore name. |
server_name
Required
|
The PostgreSQL server name. |
database_name
Required
|
The PostgreSQL database name. |
user_id
Required
|
The User ID of the PostgreSQL server. |
user_password
Required
|
The User Password of the PostgreSQL server. |
port_number
|
The Port Number of the PostgreSQL server Default value: None
|
endpoint
|
str, <xref:optional>
The endpoint of the PostgreSQL server. If None, defaults to postgres.database.azure.com. Default value: None
|
overwrite
|
bool, <xref:optional>
Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False. Default value: False
|
enforce_ssl
|
Indicates SSL requirement of PostgreSQL server. Defaults to True. Default value: True
|
Returns
Type | Description |
---|---|
Returns the PostgreSQL database Datastore. |
Remarks
If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.
psql_datastore_name="postgresqldatastore"
server_name=os.getenv("PSQL_SERVERNAME", "<my_server_name>") # FQDN name of the PostgreSQL server
database_name=os.getenv("PSQL_DATBASENAME", "<my_database_name>") # Name of the PostgreSQL database
user_id=os.getenv("PSQL_USERID", "<my_user_id>") # The database user id
user_password=os.getenv("PSQL_USERPW", "<my_user_password>") # The database user password
psql_datastore = Datastore.register_azure_postgre_sql(
workspace=ws,
datastore_name=psql_datastore_name,
server_name=server_name,
database_name=database_name,
user_id=user_id,
user_password=user_password)
register_azure_sql_database
Initialize a new Azure SQL database Datastore.
Credential based (GA) and identity based (Preview) data access are supported, you can choose to use Service Principal or username + password. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.
Please see below for an example of how to register an Azure SQL database as a Datastore.
static register_azure_sql_database(workspace, datastore_name, server_name, database_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, endpoint=None, overwrite=False, username=None, password=None, subscription_id=None, resource_group=None, grant_workspace_access=False, **kwargs)
Parameters
Name | Description |
---|---|
workspace
Required
|
The workspace this datastore belongs to. |
datastore_name
Required
|
The datastore name. |
server_name
Required
|
The SQL server name. For fully qualified domain name like "sample.database.windows.net", the server_name value should be "sample", and the endpoint value should be "database.windows.net". |
database_name
Required
|
The SQL database name. |
tenant_id
|
The Directory ID/Tenant ID of the service principal. Default value: None
|
client_id
|
The Client ID/Application ID of the service principal. Default value: None
|
client_secret
|
The secret of the service principal. Default value: None
|
resource_url
|
str, <xref:optional>
The resource URL, which determines what operations will be performed on the SQL database store, if None, defaults to https://database.windows.net/. Default value: None
|
authority_url
|
str, <xref:optional>
The authority URL used to authenticate the user, defaults to https://login.microsoftonline.com. Default value: None
|
endpoint
|
str, <xref:optional>
The endpoint of the SQL server. If None, defaults to database.windows.net. Default value: None
|
overwrite
|
bool, <xref:optional>
Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False. Default value: False
|
username
|
The username of the database user to access the database. Default value: None
|
password
|
The password of the database user to access the database. Default value: None
|
skip_validation
Required
|
bool, <xref:optional>
Whether to skip validation of connecting to the SQL database. Defaults to False. |
subscription_id
|
str, <xref:optional>
The ID of the subscription the ADLS store belongs to. Default value: None
|
resource_group
|
str, <xref:optional>
The resource group the ADLS store belongs to. Default value: None
|
grant_workspace_access
|
bool, <xref:optional>
Defaults to False. Set it to True to access data behind virtual network from Machine Learning Studio.This makes data access from Machine Learning Studio use workspace managed identity for authentication, and adds the workspace managed identity as Reader of the storage. You have to be owner or user access administrator of the storage to opt-in. Ask your administrator to configure it for you if you do not have the required permission. Learn more 'https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network' Default value: False
|
Returns
Type | Description |
---|---|
Returns the SQL database Datastore. |
Remarks
If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.
sql_datastore_name="azuresqldatastore"
server_name=os.getenv("SQL_SERVERNAME", "<my_server_name>") # Name of the Azure SQL server
database_name=os.getenv("SQL_DATABASENAME", "<my_database_name>") # Name of the Azure SQL database
username=os.getenv("SQL_USER_NAME", "<my_sql_user_name>") # The username of the database user.
password=os.getenv("SQL_USER_PASSWORD", "<my_sql_user_password>") # The password of the database user.
sql_datastore = Datastore.register_azure_sql_database(
workspace=ws,
datastore_name=sql_datastore_name,
server_name=server_name, # name should not contain fully qualified domain endpoint
database_name=database_name,
username=username,
password=password,
endpoint='database.windows.net')
register_dbfs
Initialize a new Databricks File System (DBFS) datastore.
The DBFS datastore can only be used to create DataReference as input and PipelineData as output to DatabricksStep in Azure Machine Learning pipelines. More details can be found here..
static register_dbfs(workspace, datastore_name)
Parameters
Name | Description |
---|---|
workspace
Required
|
The workspace this datastore belongs to. |
datastore_name
Required
|
The datastore name. |
Returns
Type | Description |
---|---|
Returns the DBFS Datastore. |
Remarks
If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.
register_hdfs
Note
This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Initialize a new HDFS datastore.
static register_hdfs(workspace, datastore_name, protocol, namenode_address, hdfs_server_certificate, kerberos_realm, kerberos_kdc_address, kerberos_principal, kerberos_keytab=None, kerberos_password=None, overwrite=False)
Parameters
Name | Description |
---|---|
workspace
Required
|
the workspace this datastore belongs to |
datastore_name
Required
|
the datastore name |
protocol
Required
|
str or
<xref:_restclient.models.enum>
The protocol to use when communicating with the HDFS cluster. http or https. Possible values include: 'http', 'https' |
namenode_address
Required
|
The IP address or DNS hostname of the HDFS namenode. Optionally includes a port. |
hdfs_server_certificate
Required
|
str, <xref:optional>
The path to the TLS signing certificate of the HDFS namenode, if using TLS with a self-signed cert. |
kerberos_realm
Required
|
The Kerberos realm. |
kerberos_kdc_address
Required
|
The IP address or DNS hostname of the Kerberos KDC. |
kerberos_principal
Required
|
The Kerberos principal to use for authentication and authorization. |
kerberos_keytab
Required
|
str, <xref:optional>
The path to the keytab file containing the key(s) corresponding to the Kerberos principal. Provide either this, or a password. |
kerberos_password
Required
|
str, <xref:optional>
The password corresponding to the Kerberos principal. Provide either this, or the path to a keytab file. |
overwrite
Required
|
bool, <xref:optional>
overwrites an existing datastore. If the datastore does not exist, it will create one. Defaults to False. |
set_as_default
Set the default datastore.
set_as_default()
Parameters
Name | Description |
---|---|
datastore_name
Required
|
The name of the datastore. |
unregister
Unregisters the datastore. the underlying storage service will not be deleted.
unregister()