Use Java to manage directories and files in Azure Data Lake Storage Gen2

This article shows you how to use Java to create and manage directories and files in storage accounts that have a hierarchical namespace.

To learn about how to get, set, and update the access control lists (ACL) of directories and files, see Use .Java to manage ACLs in Azure Data Lake Storage Gen2.

Package (Maven) | Samples | API reference | Gen1 to Gen2 mapping | Give Feedback

Prerequisites

  • An Azure subscription. For more information, see Get Azure free trial.

  • A storage account that has hierarchical namespace enabled. Follow these instructions to create one.

Set up your project

To get started, open this page and find the latest version of the Java library. Then, open the pom.xml file in your text editor. Add a dependency element that references that version.

If you plan to authenticate your client application by using Azure Active Directory (Azure AD), then add a dependency to the Azure Secret Client Library. For more information, see Adding the Secret Client Library package to your project.

Next, add these imports statements to your code file.

import com.azure.storage.common.StorageSharedKeyCredential;
import com.azure.storage.file.datalake.DataLakeDirectoryClient;
import com.azure.storage.file.datalake.DataLakeFileClient;
import com.azure.storage.file.datalake.DataLakeFileSystemClient;
import com.azure.storage.file.datalake.DataLakeServiceClient;
import com.azure.storage.file.datalake.DataLakeServiceClientBuilder;
import com.azure.storage.file.datalake.models.ListPathsOptions;
import com.azure.storage.file.datalake.models.PathItem;
import com.azure.storage.file.datalake.models.AccessControlChangeCounters;
import com.azure.storage.file.datalake.models.AccessControlChangeResult;
import com.azure.storage.file.datalake.models.AccessControlType;
import com.azure.storage.file.datalake.models.PathAccessControl;
import com.azure.storage.file.datalake.models.PathAccessControlEntry;
import com.azure.storage.file.datalake.models.PathPermissions;
import com.azure.storage.file.datalake.models.PathRemoveAccessControlEntry;
import com.azure.storage.file.datalake.models.RolePermissions;
import com.azure.storage.file.datalake.options.PathSetAccessControlRecursiveOptions;

Connect to the account

To use the snippets in this article, you'll need to create a DataLakeServiceClient instance that represents the storage account.

Connect by using an account key

This is the easiest way to connect to an account.

This example creates a DataLakeServiceClient instance by using an account key.

static public DataLakeServiceClient GetDataLakeServiceClient
(String accountName, String accountKey){

    StorageSharedKeyCredential sharedKeyCredential =
        new StorageSharedKeyCredential(accountName, accountKey);

    DataLakeServiceClientBuilder builder = new DataLakeServiceClientBuilder();

    builder.credential(sharedKeyCredential);
    builder.endpoint("https://" + accountName + ".dfs.core.windows.net");

    return builder.buildClient();
}

Connect by using Azure Active Directory (Azure AD)

You can use the Azure identity client library for Java to authenticate your application with Azure AD.

This example creates a DataLakeServiceClient instance by using a client ID, a client secret, and a tenant ID. To get these values, see Acquire a token from Azure AD for authorizing requests from a client application.

static public DataLakeServiceClient GetDataLakeServiceClient
    (String accountName, String clientId, String ClientSecret, String tenantID){

    String endpoint = "https://" + accountName + ".dfs.core.windows.net";
    
    ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
    .clientId(clientId)
    .clientSecret(ClientSecret)
    .tenantId(tenantID)
    .build();
       
    DataLakeServiceClientBuilder builder = new DataLakeServiceClientBuilder();
    return builder.credential(clientSecretCredential).endpoint(endpoint).buildClient();
}

Note

For more examples, see the Azure identity client library for Java documentation.

Create a container

A container acts as a file system for your files. You can create one by calling the DataLakeServiceClient.createFileSystem method.

This example creates a container named my-file-system.

public DataLakeFileSystemClient CreateFileSystem
(DataLakeServiceClient serviceClient){

    return serviceClient.createFileSystem("my-file-system");
}

Create a directory

Create a directory reference by calling the DataLakeFileSystemClient.createDirectory method.

This example adds a directory named my-directory to a container, and then adds a sub-directory named my-subdirectory.

public DataLakeDirectoryClient CreateDirectory
(DataLakeServiceClient serviceClient, String fileSystemName){

    DataLakeFileSystemClient fileSystemClient =
    serviceClient.getFileSystemClient(fileSystemName);

    DataLakeDirectoryClient directoryClient =
        fileSystemClient.createDirectory("my-directory");

    return directoryClient.createSubdirectory("my-subdirectory");
}

Rename or move a directory

Rename or move a directory by calling the DataLakeDirectoryClient.rename method. Pass the path of the desired directory a parameter.

This example renames a sub-directory to the name my-subdirectory-renamed.

public DataLakeDirectoryClient
    RenameDirectory(DataLakeFileSystemClient fileSystemClient){

    DataLakeDirectoryClient directoryClient =
        fileSystemClient.getDirectoryClient("my-directory/my-subdirectory");

    return directoryClient.rename(fileSystemClient.getFileSystemName(),"my-subdirectory-renamed");
}

This example moves a directory named my-subdirectory-renamed to a sub-directory of a directory named my-directory-2.

public DataLakeDirectoryClient MoveDirectory
(DataLakeFileSystemClient fileSystemClient){

    DataLakeDirectoryClient directoryClient =
        fileSystemClient.getDirectoryClient("my-directory/my-subdirectory-renamed");

    return directoryClient.rename(fileSystemClient.getFileSystemName(),"my-directory-2/my-subdirectory-renamed");                
}

Delete a directory

Delete a directory by calling the DataLakeDirectoryClient.deleteWithResponse method.

This example deletes a directory named my-directory.

public void DeleteDirectory(DataLakeFileSystemClient fileSystemClient){
    
    DataLakeDirectoryClient directoryClient =
        fileSystemClient.getDirectoryClient("my-directory");

    directoryClient.deleteWithResponse(true, null, null, null);
}

Upload a file to a directory

First, create a file reference in the target directory by creating an instance of the DataLakeFileClient class. Upload a file by calling the DataLakeFileClient.append method. Make sure to complete the upload by calling the DataLakeFileClient.FlushAsync method.

This example uploads a text file to a directory named my-directory.

public void UploadFile(DataLakeFileSystemClient fileSystemClient) 
    throws FileNotFoundException{
    
    DataLakeDirectoryClient directoryClient =
        fileSystemClient.getDirectoryClient("my-directory");

    DataLakeFileClient fileClient = directoryClient.createFile("uploaded-file.txt");

    File file = new File("C:\\Users\\constoso\\mytestfile.txt");

 //   InputStream targetStream = new FileInputStream(file);
    InputStream targetStream = new BufferedInputStream(new FileInputStream(file));

    long fileSize = file.length();

    fileClient.append(targetStream, 0, fileSize);

    fileClient.flush(fileSize);
}

Tip

If your file size is large, your code will have to make multiple calls to the DataLakeFileClient.append method. Consider using the DataLakeFileClient.uploadFromFile method instead. That way, you can upload the entire file in a single call.

See the next section for an example.

Upload a large file to a directory

Use the DataLakeFileClient.uploadFromFile method to upload large files without having to make multiple calls to the DataLakeFileClient.append method.

public void UploadFileBulk(DataLakeFileSystemClient fileSystemClient) 
    throws FileNotFoundException{
    
    DataLakeDirectoryClient directoryClient =
        fileSystemClient.getDirectoryClient("my-directory");

    DataLakeFileClient fileClient = directoryClient.getFileClient("uploaded-file.txt");

    fileClient.uploadFromFile("C:\\Users\\contoso\\mytestfile.txt");

}

Download from a directory

First, create a DataLakeFileClient instance that represents the file that you want to download. Use the DataLakeFileClient.read method to read the file. Use any Java file processing API to save bytes from the stream to a file.

public void DownloadFile(DataLakeFileSystemClient fileSystemClient)
  throws FileNotFoundException, java.io.IOException{

    DataLakeDirectoryClient directoryClient =
        fileSystemClient.getDirectoryClient("my-directory");

    DataLakeFileClient fileClient = 
        directoryClient.getFileClient("uploaded-file.txt");

    File file = new File("C:\\Users\\contoso\\downloadedFile.txt");

    OutputStream targetStream = new FileOutputStream(file);
    
    fileClient.read(targetStream);

    targetStream.close();
   
}

List directory contents

This example prints the names of each file that is located in a directory named my-directory.

public void ListFilesInDirectory(DataLakeFileSystemClient fileSystemClient){
    
    ListPathsOptions options = new ListPathsOptions();
    options.setPath("my-directory");
 
    PagedIterable<PathItem> pagedIterable = 
    fileSystemClient.listPaths(options, null);

    java.util.Iterator<PathItem> iterator = pagedIterable.iterator();

   
    PathItem item = iterator.next();

    while (item != null)
    {
        System.out.println(item.getName());


        if (!iterator.hasNext())
        {
            break;
        }
        
        item = iterator.next();
    }

}

See also