Events
Mar 31, 11 PM - Apr 2, 11 PM
The ultimate Microsoft Fabric, Power BI, SQL, and AI community-led event. March 31 to April 2, 2025.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Azure Data Lake Storage is Microsoft's optimized storage solution for for big data analytics workloads. A fundamental part of Data Lake Storage Gen2 is the addition of a hierarchical namespace to Blob storage. The hierarchical namespace organizes objects/files into a hierarchy of directories for efficient data access.
Source code | API reference documentation | REST API documentation | Product documentation | Samples
Please include the azure-sdk-bom to your project to take dependency on GA version of the library. In the following snippet, replace the {bom_version_to_target} placeholder with the version number. To learn more about the BOM, see the AZURE SDK BOM README.
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-sdk-bom</artifactId>
<version>{bom_version_to_target}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
and then include the direct dependency in the dependencies section without the version tag.
<dependencies>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-storage-file-datalake</artifactId>
</dependency>
</dependencies>
If you want to take dependency on a particular version of the library that is not present in the BOM, add the direct dependency to your project as follows.
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-storage-file-datalake</artifactId>
<version>12.22.0</version>
</dependency>
To create a Storage Account you can use the Azure Portal or Azure CLI. Note: To use data lake, your account must have hierarchical namespace enabled.
# Install the extension “Storage-Preview”
az extension add --name storage-preview
# Create the storage account
az storage account create -n my-storage-account-name -g my-resource-group --sku Standard_LRS --kind StorageV2 --hierarchical-namespace true
Your storage account URL, subsequently identified as <your-storage-account-url>
, would be formatted as follows
http(s)://<storage-account-name>.dfs.core.windows.net
In order to interact with the Storage Service you'll need to create an instance of the Service Client class. To make this possible you'll need the Account SAS (shared access signature) string of the Storage Account. Learn more at SAS Token
a. Use the Azure CLI snippet below to get the SAS token from the Storage Account.
az storage blob generate-sas \
--account-name {Storage Account name} \
--container-name {container name} \
--name {blob name} \
--permissions {permissions to grant} \
--expiry {datetime to expire the SAS token} \
--services {storage services the SAS allows} \
--resource-types {resource types the SAS allows}
Example:
CONNECTION_STRING=<connection-string>
az storage blob generate-sas \
--account-name MyStorageAccount \
--container-name MyContainer \
--name MyBlob \
--permissions racdw \
--expiry 2020-06-15
b. Alternatively, get the Account SAS Token from the Azure Portal.
Shared access signature
from the menu on the leftGenerate SAS and connection string
(after setup)a. Use Account name and Account key. Account name is your Storage Account name.
Access keys
from the menu on the leftkey1
/key2
copy the contents of the Key
fieldor
b. Use the connection string.
Access keys
from the menu on the leftkey1
/key2
copy the contents of the Connection string
fieldDataLake Storage Gen2 was designed to:
Key Features of DataLake Storage Gen2 include:
A fundamental part of Data Lake Storage Gen2 is the addition of a hierarchical namespace to Blob storage. The hierarchical namespace organizes objects/files into a hierarchy of directories for efficient data access.
In the past, cloud-based analytics had to compromise in areas of performance, management, and security. Data Lake Storage Gen2 addresses each of these aspects in the following ways:
Data Lake Storage Gen2 offers two types of resources:
_filesystem
used via 'DataLakeFileSystemClient'_path
used via 'DataLakeFileClient' or 'DataLakeDirectoryClient'ADLS Gen2 | Blob |
---|---|
Filesystem | Container |
Path (File or Directory) | Blob |
Note: This client library does not support hierarchical namespace (HNS) disabled storage accounts.
Paths are addressable using the following URL format: The following URL addresses a file:
https://${myaccount}.dfs.core.windows.net/${myfilesystem}/${myfile}
For the storage account, the base URI for datalake operations includes the name of the account only:
https://${myaccount}.dfs.core.windows.net
For a file system, the base URI includes the name of the account and the name of the file system:
https://${myaccount}.dfs.core.windows.net/${myfilesystem}
For a file/directory, the base URI includes the name of the account, the name of the file system and the name of the path:
https://${myaccount}.dfs.core.windows.net/${myfilesystem}/${mypath}
Note that the above URIs may not hold for more advanced scenarios such as custom domain names.
The following sections provide several code snippets covering some of the most common Azure Storage Blob tasks, including:
DataLakeServiceClient
DataLakeFileSystemClient
DataLakeFileClient
DataLakeDirectoryClient
Create a DataLakeServiceClient
using the sasToken
generated above.
DataLakeServiceClient dataLakeServiceClient = new DataLakeServiceClientBuilder()
.endpoint("<your-storage-account-url>")
.sasToken("<your-sasToken>")
.buildClient();
or
// Only one "?" is needed here. If the sastoken starts with "?", please removing one "?".
DataLakeServiceClient dataLakeServiceClient = new DataLakeServiceClientBuilder()
.endpoint("<your-storage-account-url>" + "?" + "<your-sasToken>")
.buildClient();
Create a DataLakeFileSystemClient
using a DataLakeServiceClient
.
DataLakeFileSystemClient dataLakeFileSystemClient = dataLakeServiceClient.getFileSystemClient("myfilesystem");
or
Create a DataLakeFileSystemClient
from the builder sasToken
generated above.
DataLakeFileSystemClient dataLakeFileSystemClient = new DataLakeFileSystemClientBuilder()
.endpoint("<your-storage-account-url>")
.sasToken("<your-sasToken>")
.fileSystemName("myfilesystem")
.buildClient();
or
// Only one "?" is needed here. If the sastoken starts with "?", please removing one "?".
DataLakeFileSystemClient dataLakeFileSystemClient = new DataLakeFileSystemClientBuilder()
.endpoint("<your-storage-account-url>" + "/" + "myfilesystem" + "?" + "<your-sasToken>")
.buildClient();
Create a DataLakeFileClient
using a DataLakeFileSystemClient
.
DataLakeFileClient fileClient = dataLakeFileSystemClient.getFileClient("myfile");
or
Create a FileClient
from the builder sasToken
generated above.
DataLakeFileClient fileClient = new DataLakePathClientBuilder()
.endpoint("<your-storage-account-url>")
.sasToken("<your-sasToken>")
.fileSystemName("myfilesystem")
.pathName("myfile")
.buildFileClient();
or
// Only one "?" is needed here. If the sastoken starts with "?", please removing one "?".
DataLakeFileClient fileClient = new DataLakePathClientBuilder()
.endpoint("<your-storage-account-url>" + "/" + "myfilesystem" + "/" + "myfile" + "?" + "<your-sasToken>")
.buildFileClient();
Get a DataLakeDirectoryClient
using a DataLakeFileSystemClient
.
DataLakeDirectoryClient directoryClient = dataLakeFileSystemClient.getDirectoryClient("mydir");
or
Create a DirectoryClient
from the builder sasToken
generated above.
DataLakeDirectoryClient directoryClient = new DataLakePathClientBuilder()
.endpoint("<your-storage-account-url>")
.sasToken("<your-sasToken>")
.fileSystemName("myfilesystem")
.pathName("mydir")
.buildDirectoryClient();
or
// Only one "?" is needed here. If the sastoken starts with "?", please removing one "?".
DataLakeDirectoryClient directoryClient = new DataLakePathClientBuilder()
.endpoint("<your-storage-account-url>" + "/" + "myfilesystem" + "/" + "mydir" + "?" + "<your-sasToken>")
.buildDirectoryClient();
Create a file system using a DataLakeServiceClient
.
dataLakeServiceClient.createFileSystem("myfilesystem");
or
Create a file system using a DataLakeFileSystemClient
.
dataLakeFileSystemClient.create();
Enumerating all paths using a DataLakeFileSystemClient
.
for (PathItem pathItem : dataLakeFileSystemClient.listPaths()) {
System.out.println("This is the path name: " + pathItem.getName());
}
Rename a file using a DataLakeFileClient
.
//Need to authenticate with azure identity and add role assignment "Storage Blob Data Contributor" to do the following operation.
DataLakeFileClient fileClient = dataLakeFileSystemClient.getFileClient("myfile");
fileClient.create();
fileClient.rename("new-file-system-name", "new-file-name");
Rename a directory using a DataLakeDirectoryClient
.
//Need to authenticate with azure identity and add role assignment "Storage Blob Data Contributor" to do the following operation.
DataLakeDirectoryClient directoryClient = dataLakeFileSystemClient.getDirectoryClient("mydir");
directoryClient.create();
directoryClient.rename("new-file-system-name", "new-directory-name");
Get properties from a file using a DataLakeFileClient
.
DataLakeFileClient fileClient = dataLakeFileSystemClient.getFileClient("myfile");
fileClient.create();
PathProperties properties = fileClient.getProperties();
Get properties from a directory using a DataLakeDirectoryClient
.
DataLakeDirectoryClient directoryClient = dataLakeFileSystemClient.getDirectoryClient("mydir");
directoryClient.create();
PathProperties properties = directoryClient.getProperties();
The Azure Identity library provides Azure Active Directory support for authenticating with Azure Storage.
DataLakeServiceClient storageClient = new DataLakeServiceClientBuilder()
.endpoint("<your-storage-account-url>")
.credential(new DefaultAzureCredentialBuilder().build())
.buildClient();
When interacting with data lake using this Java client library, errors returned by the service correspond to the same HTTP
status codes returned for REST API requests. For example, if you try to retrieve a file system or path that
doesn't exist in your Storage Account, a 404
error is returned, indicating Not Found
.
All client libraries by default use the Netty HTTP client. Adding the above dependency will automatically configure the client library to use the Netty HTTP client. Configuring or changing the HTTP client is detailed in the HTTP clients wiki.
All client libraries, by default, use the Tomcat-native Boring SSL library to enable native-level performance for SSL operations. The Boring SSL library is an uber jar containing native libraries for Linux / macOS / Windows, and provides better performance compared to the default SSL implementation within the JDK. For more information, including how to reduce the dependency size, refer to the performance tuning section of the wiki.
Several Storage datalake Java SDK samples are available to you in the SDK's GitHub repository.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Azure SDK for Java feedback
Azure SDK for Java is an open source project. Select a link to provide feedback:
Events
Mar 31, 11 PM - Apr 2, 11 PM
The ultimate Microsoft Fabric, Power BI, SQL, and AI community-led event. March 31 to April 2, 2025.
Register today