AzureDataLakeGen2Datastore Class

Represents a datastore that saves connection information to Azure Data Lake Storage Gen2.

To create a datastore that saves connection information to Azure Data Lake Storage, use the register_azure_data_lake_gen2 method of the Datastore class.

To access data from an AzureDataLakeGen2Datastore object, create a Dataset and use one of the methods like from_files for a FileDataset. For more information, see Create Azure Machine Learning datasets.

Also keep in mind:

  • The AzureDataLakeGen2 class does not provide upload method, recommended way to uploading data to AzureDataLakeGen2 datastores is via Dataset upload. More details could be found at : https://docs.microsoft.com/azure/machine-learning/how-to-create-register-datasets

  • When using a datastore to access data, you must have permission to access the data, which depends on the credentials registered with the datastore.

  • When using Service Principal Authentication to access storage via AzureDataLakeGen2, the service principal or app registration must be assigned the specific role-based access control (RBAC) role at minimum of "Storage Blob Data Reader". For more information, see Storage built-in roles.

Initialize a new Azure Data Lake Gen2 Datastore.

Inheritance
AzureDataLakeGen2Datastore

Constructor

AzureDataLakeGen2Datastore(workspace, name, container_name, account_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, protocol=None, endpoint=None, service_data_access_auth_identity=None)

Parameters

Name Description
workspace
Required
str

The workspace this datastore belongs to.

name
Required
str

The datastore name.

container_name
Required
str

The name of the Azure blob container.

account_name
Required
str

The storage account name.

tenant_id
str

The Directory ID/Tenant ID of the service principal.

default value: None
client_id
str

The Client ID/Application ID of the service principal.

default value: None
client_secret
str

The secret of the service principal.

default value: None
resource_url
str

The resource url, which determines what operations will be performed on the Data Lake Store.

default value: None
authority_url
str

The authority URL used to authenticate the user.

default value: None
protocol
str

The protocol to use to connect to the blob container. If None, defaults to https.

default value: None
endpoint
str

The endpoint of the blob container. If None, defaults to core.windows.net.

default value: None
service_data_access_auth_identity
str or <xref:_restclient.models.ServiceDataAccessAuthIdentity>

Indicates which identity to use to authenticate service data access to customer's storage. Possible values include: 'None', 'WorkspaceSystemAssignedIdentity', 'WorkspaceUserAssignedIdentity'

default value: None
workspace
Required
str

The workspace this datastore belongs to.

name
Required
str

The datastore name.

container_name
Required
str

The name of the Azure blob container.

account_name
Required
str

The storage account name.

tenant_id
Required
str

The Directory ID/Tenant ID of the service principal.

client_id
Required
str

The Client ID/Application ID of the service principal.

client_secret
Required
str

The secret of the service principal.

resource_url
Required
str

The resource url, which determines what operations will be performed on the Data Lake Store.

authority_url
Required
str

The authority URL used to authenticate the user.

protocol
Required
str

The protocol to use to connect to the blob container. If None, defaults to https.

endpoint
Required
str

The endpoint of the blob container. If None, defaults to core.windows.net.

service_data_access_auth_identity
Required
str or <xref:_restclient.models.ServiceDataAccessAuthIdentity>

Indicates which identity to use to authenticate service data access to customer's storage. Possible values include: 'None', 'WorkspaceSystemAssignedIdentity', 'WorkspaceUserAssignedIdentity'