read and write blob via ABFS driver

Pradip Sodha 21 Reputation points
2021-07-09T13:15:41.037+00:00

I know this what-is-the-difference-between-abfss-and-wasbs-in-azure-storage,

but still, I have doubt,
what If I want to read and write blob vi spark by using ABFS driver, Is that faster or slower, since ABFS design for Big data analysis and I want that,

here, Blob means Hierarchical namespace is Disabled

so which is better for Big data workload stored in Blob storage,

  1. ABFS (azure blob file system)
  2. WASB (windows azure storage blob)

note: I know for Big data workload ADLS gen2 is best but even though

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,426 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,639 questions
0 comments No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 85,746 Reputation points Microsoft Employee
    2021-07-12T10:28:46.65+00:00

    Hello @Pradip Sodha ,

    Welcome to the Microsoft Q&A platform.

    Azure Data Lake Storage Gen 2 (ADLS Gen 2) is a set of capabilities dedicated to big data analytics, built on top of Azure Blob Storage. The ABFS and ABFSS schemes target the ADLS Gen 2 REST API, and the WASB and WASBS schemes target the Azure Blob Storage REST API. ADLS Gen 2 offers better performance and scalability. ADLS Gen 2 also offers authentication and authorization compatible with the Hadoop Distributed File System permissions model when hierarchical namespace is enabled for the storage account. Furthermore, the metadata and data produced by ADLS Gen 2 REST API can be consumed by Blob REST API, and vice versa.

    The ABFS driver is optimized specifically for big data analytics.

    Prior capability: The Windows Azure Storage Blob driver:

    The Windows Azure Storage Blob driver or WASB driver provided the original support for Azure Blob Storage. This driver performed the complex task of mapping file system semantics (as required by the Hadoop FileSystem interface) to that of the object store style interface exposed by Azure Blob Storage. This driver continues to support this model, providing high performance access to data stored in blobs, but contains a significant amount of code performing this mapping, making it difficult to maintain. Additionally, some operations such as FileSystem.rename() and FileSystem.delete() when applied to directories require the driver to perform a vast number of operations (due to object stores lack of support for directories) which often leads to degraded performance. The ABFS driver was designed to overcome the inherent deficiencies of WASB.

    For more details, refer to The Azure Blob Filesystem driver (ABFS): A dedicated Azure Storage driver for Hadoop.

    Hope this helps. Do let us know if you any further queries.

    ---------------------------------------------------------------------------

    Please "Accept the answer" if the information helped you. This will help us and others in the community as well.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful