How to mount an Azure Blob Storage container on Linux with BlobFuse2

This article shows you how to install and configure BlobFuse2, mount an Azure blob container, and access data in the container. The basic steps are:

Install BlobFuse2

Configure BlobFuse2

Mount a blob container

Access data

How to install BlobFuse2

You have two options for installing BlobFuse2:

Option 1: Install BlobFuse2 from the Microsoft software repositories for Linux

To see supported distributions, see BlobFuse2 releases.

For information about libfuse support, see the BlobFuse2 README.

To check your version of Linux, run the following command:

lsb_release -a

If no binaries are available for your distribution, you can Option 2: Build the binaries from source code.

To install BlobFuse2 from the repositories:

Configure the Microsoft package repository

Install BlobFuse2

Configure the Microsoft package repository

Configure the Linux Package Repository for Microsoft Products.

As an example, on a Redhat Enterprise Linux 8 distribution:

sudo rpm -Uvh https://packages.microsoft.com/config/rhel/8/packages-microsoft-prod.rpm

Similarly, change the URL to .../rhel/7/... to point to a Redhat Enterprise Linux 7 distribution.

Another example on an Ubuntu 20.04 distribution:

wget https://packages.microsoft.com/config/ubuntu/20.04/packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb
sudo apt-get update
sudo apt-get install libfuse3-dev fuse3 

Similarly, change the URL to .../ubuntu/16.04/... or .../ubuntu/18.04/... to reference another Ubuntu version.

Install BlobFuse2

On an Ubuntu/Debian distribution:

sudo apt-get install blobfuse2

On a Redhat Enterprise Linux distribution:

sudo yum install blobfuse2

On a SUSE distribution:

sudo zypper install blobfuse2

Option 2: Build the binaries from source code

To build the BlobFuse2 binaries from source code:

  1. Install the dependencies:

    1. Install Git:

      sudo apt-get install git
      
    2. Install BlobFuse2 dependencies.

      On Ubuntu:

      sudo apt-get install libfuse3-dev fuse3 -y
      
  2. Clone the repository:

    git clone https://github.com/Azure/azure-storage-fuse/
    cd ./azure-storage-fuse
    git checkout main
    
  3. Build BlobFuse2:

    go get
    go build -tags=fuse3
    

Tip

If you need to install Go, see Download and install Go.

How to configure BlobFuse2

You can configure BlobFuse2 by using various settings. Some of the typical settings include:

  • Logging location and options
  • Temporary file path for caching
  • Information about the Azure storage account and blob container to be mounted

The settings can be configured in a YAML configuration file, using environment variables, or as parameters passed to the BlobFuse2 commands. The preferred method is to use the configuration file.

For details about each of the configuration parameters for BlobFuse2 and how to specify them, see these articles:

To configure BlobFuse2 for mounting:

  1. Configure caching.
  2. Create an empty directory to mount the blob container.
  3. Authorize access to your storage account.

Configure caching

BlobFuse2 provides native-like performance by using local file-caching techniques. The caching configuration and behavior varies, depending on whether you're streaming large files or accessing smaller files.

Configure caching for streaming large files

BlobFuse2 supports streaming for read and write operations as an alternative to disk caching for files. In streaming mode, BlobFuse2 caches blocks of large files in memory both for reading and writing. The configuration settings related to caching for streaming are under the stream: settings in your configuration file:

stream:
    block-size-mb:
        For read only mode, the size of each block to be cached in memory while streaming (in MB)
        For read/write mode, the size of newly created blocks
    max-buffers: The total number of buffers to store blocks in
    buffer-size-mb: The size for each buffer

To get started quickly with some settings for a basic streaming scenario, see the sample streaming configuration file.

Configure caching for smaller files

Smaller files are cached to a temporary path that's specified under file_cache: in the configuration file:

file_cache:
    path: <path to local disk cache>

Note

BlobFuse2 stores all open file contents in the temporary path. Make sure you have enough space to contain all open files.

You have three common options to configure the temporary path for file caching:

Use a local high-performing disk

If you use an existing local disk for file caching, choose a disk that provides the best performance possible, such as a solid-state disk (SSD).

Use a RAM disk

The following example creates a RAM disk of 16 GB and a directory for BlobFuse2. Choose a size that meets your requirements. BlobFuse2 uses the RAM disk to open files that are up to 16 GB in size.

sudo mkdir /mnt/ramdisk
sudo mount -t tmpfs -o size=16g tmpfs /mnt/ramdisk
sudo mkdir /mnt/ramdisk/blobfuse2tmp
sudo chown <youruser> /mnt/ramdisk/blobfuse2tmp
Use an SSD

In Azure, you can use the SSD ephemeral disks that are available on your VMs to provide a low-latency buffer for BlobFuse2. Depending on the provisioning agent you use, mount the ephemeral disk on /mnt for cloud-init or /mnt/resource for Microsoft Azure Linux Agent (waagent) VMs.

Make sure that your user has access to the temporary path:

sudo mkdir /mnt/resource/blobfuse2tmp -p
sudo chown <youruser> /mnt/resource/blobfuse2tmp

Create an empty directory to mount the blob container

To create an empty directory to mount the blob container:

mkdir ~/mycontainer

Authorize access to your storage account

You must grant access to the storage account for the user who mounts the container. The most common ways to grant access are by using one of the following options:

  • Storage account access key
  • Shared access signature
  • Managed identity
  • Service principal

You can provide authorization information in a configuration file or in environment variables. For more information, see Configure settings for BlobFuse2.

How to mount a blob container

Important

BlobFuse2 doesn't support overlapping mount paths. If you run multiple instances of BlobFuse2, make sure that each instance has a unique and non-overlapping mount point.

BlobFuse2 doesn't support coexistence with NFS on the same mount path. The results of running BlobFuse2 on the same mount path as NFS are undefined and might result in data corruption.

To mount an Azure block blob container by using BlobFuse2, run the following command. The command mounts the container specified in ./config.yaml onto the location ~/mycontainer:

blobfuse2 mount ~/mycontainer --config-file=./config.yaml

Note

For a full list of mount options, see BlobFuse2 mount commands.

You should now have access to your block blobs through the Linux file system and related APIs. To test your deployment, try creating a new directory and file:

cd ~/mycontainer
mkdir test
echo "hello world" > test/blob.txt

How to access data

Generally, you can work with the BlobFuse2-mounted storage like you would work with the native Linux file system. It uses the virtual directory scheme with a forward slash (/) as a delimiter in the file path and supports basic file system operations such as mkdir, opendir, readdir, rmdir, open, read, create, write, close, unlink, truncate, stat, and rename.

However, you should be aware of some key differences in functionality:

Feature support

This table shows how this feature is supported in your account and the effect on support when you enable certain capabilities:

Storage account type Blob Storage (default support) Data Lake Storage Gen2 1 NFS 3.0 1 SFTP 1
Standard general-purpose v2 Yes Yes Yes Yes
Premium block blobs Yes Yes Yes Yes

1 Azure Data Lake Storage Gen2, Network File System (NFS) 3.0 protocol, and SSH File Transfer Protocol (SFTP) support all require a storage account with a hierarchical namespace enabled.

See also

Next steps