AzureDataLakeGen2Datastore Class

Represents a datastore that saves connection information to Azure Data Lake Storage Gen2.

To create a datastore that saves connection information to Azure Data Lake Storage, use the register_azure_data_lake_gen2 method of the Datastore class.

To access data from an AzureDataLakeGen2Datastore object, create a Dataset and use one of the methods like from_files for a FileDataset. For more information, see Create Azure Machine Learning datasets.

Also keep in mind:

  • The AzureDataLakeGen2 class does not provide upload method, recommended way to uploading data to AzureDataLakeGen2 datastores is via Dataset upload. More details could be found at : https://docs.microsoft.com/azure/machine-learning/how-to-create-register-datasets

  • When using a datastore to access data, you must have permission to access the data, which depends on the credentials registered with the datastore.

  • When using Service Principal Authentication to access storage via AzureDataLakeGen2, the service principal or app registration must be assigned the specific role-based access control (RBAC) role at minimum of "Storage Blob Data Reader". For more information, see Storage built-in roles.

Initialize a new Azure Data Lake Gen2 Datastore.

Inheritance
AzureDataLakeGen2Datastore

Constructor

AzureDataLakeGen2Datastore(workspace, name, container_name, account_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, protocol=None, endpoint=None, service_data_access_auth_identity=None)

Parameters

workspace
str
Required

The workspace this datastore belongs to.

name
str
Required

The datastore name.

container_name
str
Required

The name of the Azure blob container.

account_name
str
Required

The storage account name.

tenant_id
str
default value: None

The Directory ID/Tenant ID of the service principal.

client_id
str
default value: None

The Client ID/Application ID of the service principal.

client_secret
str
default value: None

The secret of the service principal.

resource_url
str
default value: None

The resource url, which determines what operations will be performed on the Data Lake Store.

authority_url
str
default value: None

The authority URL used to authenticate the user.

protocol
str
default value: None

The protocol to use to connect to the blob container. If None, defaults to https.

endpoint
str
default value: None

The endpoint of the blob container. If None, defaults to core.windows.net.

service_data_access_auth_identity
str or <xref:_restclient.models.ServiceDataAccessAuthIdentity>
default value: None

Indicates which identity to use to authenticate service data access to customer's storage. Possible values include: 'None', 'WorkspaceSystemAssignedIdentity', 'WorkspaceUserAssignedIdentity'

workspace
str
Required

The workspace this datastore belongs to.

name
str
Required

The datastore name.

container_name
str
Required

The name of the Azure blob container.

account_name
str
Required

The storage account name.

tenant_id
str
Required

The Directory ID/Tenant ID of the service principal.

client_id
str
Required

The Client ID/Application ID of the service principal.

client_secret
str
Required

The secret of the service principal.

resource_url
str
Required

The resource url, which determines what operations will be performed on the Data Lake Store.

authority_url
str
Required

The authority URL used to authenticate the user.

protocol
str
Required

The protocol to use to connect to the blob container. If None, defaults to https.

endpoint
str
Required

The endpoint of the blob container. If None, defaults to core.windows.net.

service_data_access_auth_identity
str or <xref:_restclient.models.ServiceDataAccessAuthIdentity>
Required

Indicates which identity to use to authenticate service data access to customer's storage. Possible values include: 'None', 'WorkspaceSystemAssignedIdentity', 'WorkspaceUserAssignedIdentity'