fs.azure.sas.token.provider.type that comes with azure jar's that i can use for abfss connectivity from my local spark

SHYAMALA GOWRI 70 Reputation points
2024-04-22T17:08:12.29+00:00

I have spark running locally on my mac and i am trying to connect to adls gen2 storage with SAS token.

Code that i am trying

from pyspark.sql import SparkSession
# Get spark session
spark = SparkSession.builder.appName("Test Spark App").getOrCreate()
# Set config params described above
spark.conf.set("fs.azure.account.auth.type.sparkadlsiae.dfs.core.windows.net", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.sparkadlsiae.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
spark.conf.set("fs.azure.sas.token.fixed.sparkadlsiae.dfs.core.windows.net", "si=saspolicy&sv=************")
spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "false")
# Params
storage_account_name = "sparkadlsiae"
container_name = "pyspark"
file = "people.csv"
# Read from the adls location
path = "abfss://" + container_name + "@" + storage_account_name + ".dfs.core.windows.net/test/" + file
spark.read.format("csv").load(path).show()

and i execute this as

pyspark --jars hadoop-azure-3.3.3.jar,hadoop-azure-datalake-3.3.3.jar,hadoop-common-3.3.3.jar,azure-storage-7.0.1.jar,azure-storage-common-12.24.1.jar

and i facing the below issue


and it gives me the following error 
24/04/22 14:36:42 WARN FileSystem: Failed to initialize fileystem abfss://pyspark@sparkadlsiae.dfs.core.windows.net/test/people.csv: Unable to load SAS token provider class: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider not foundjava.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider not found
24/04/22 14:36:42 WARN FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: abfss://pyspark@sparkadlsiae.dfs.core.windows.net/test/people.csv.
Unable to load SAS token provider class: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider not foundjava.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider not found
	at org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getSASTokenProvider(AbfsConfiguration.java:905)

Is there any fs.azure.sas.token.provider.type that comes with any azure jar's that i can use for abfss connectivity from my local spark

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,436 questions
0 comments No comments
{count} votes

Accepted answer
  1. Anand Prakash Yadav 6,005 Reputation points Microsoft Vendor
    2024-04-23T09:24:16.1833333+00:00

    Hello SHYAMALA GOWRI,

    Thank you for posting your query here!

    The error message indicates that the class org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider cannot be found. This class is not provided by the Azure SDK or the Hadoop Azure libraries.

    Instead, you can create your own implementation of the SASTokenProvider interface. Here is a basic example of how you can do this in Scala:

    package com.foo
    
    class CustomTokenProvider extends org.apache.hadoop.fs.azurebfs.extensions.SASTokenProvider {
      def getSASToken(accountName: String, fileSystem: String, path: String, operation: String): String = {
        return "sp=...etc etc"
      }
    
      def initialize(configuration: org.apache.hadoop.conf.Configuration, accountName: String): Unit = {
      }
    }
    
    

    And then you can use this class in your Spark configuration:

    spark.conf.set("fs.azure.account.auth.type.sparkadlsiae.dfs.core.windows.net", "SAS")
    spark.conf.set("fs.azure.sas.token.provider.type.sparkadlsiae.dfs.core.windows.net", "com.foo.CustomTokenProvider")
    
    

    Please note that this code needs to be compiled and included in the classpath when you start your Spark application. You can do this by creating a JAR file with your custom class and include it in the --jars option when you start pyspark. Also, please replace "sp=...etc etc" with your actual SAS token.

    If you want to fetch the SAS token from a secret store instead of hardcoding it, you will need to modify the getSASToken method accordingly.

    Remember to replace "com.foo.CustomTokenProvider" with the actual package and class name of your custom SAS token provider.

    I hope this helps! Please let me know if the issue persists or if you have any other questions.

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members. 

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Sina Salam 3,801 Reputation points
    2024-04-22T18:44:21.87+00:00

    Hello SHYAMALA GOWRI,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    Problem

    Sequel to your questions, I understand that you are encountering an issue while attempting to connect to Azure Data Lake Storage Gen2 (ADLS Gen2) from a local Spark environment on a Mac. You provided code snippets and error messages, indicating your attempts to configure Spark session properties for ADLS Gen2 connectivity and the inclusion of relevant JAR files. However, despite their efforts, they are unable to resolve the issue.

    Question

    Your question was that you want to know:

    Is there any fs.azure.sas.token.provider.type that comes with any azure jar's that i can use for abfss connectivity from my local spark

    Scenarios

    You are working on a project that involves processing data stored in Azure Data Lake Storage Gen2 (ADLS Gen2) using Apache Spark and running Spark locally on his Mac for development and testing purposes. To connect Spark to ADLS Gen2, you need to use Shared Access Signature (SAS) token authentication. You have written Scala code to configure the Spark session with the necessary properties for ADLS Gen2 connectivity, including setting the SAS token provider class. You also included relevant JAR files (hadoop-azure-3.3.3.jar, hadoop-azure-datalake-3.3.3.jar, etc.) in the Spark application's classpath. However, when you execute the Spark job, he encounters an error indicating that the SAS token provider class error as written below. Despite his attempts to troubleshoot and resolve the issue, you are unable to establish a successful connection to ADLS Gen2 from his local Spark environment.

    Error Warning

    The error indicating that the SAS token provider class org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider cannot be found.

    Solution

    To answer your question if there is any fs.azure.sas.token.provider.type that comes with any azure JAR's that you can use for ABFSS (Azure Blob FileSystem Storage) connectivity from my local spark.

    Firstly, yes! There are SAS token provider classes available in Azure JAR files that you can use for ABFSS (Azure Blob FileSystem Storage) connectivity from Spark running locally. One commonly used SAS token provider class is org.apache.hadoop.fs.azurebfs.sas.SimpleSASTokenProvider.

    This SAS token provider class is provided by the hadoop-azure-datalake JAR file. When configuring Spark session properties for ABFSS connectivity, you can set the fs.azure.sas.token.provider.type.<storage-account-name>.dfs.core.windows.net property to use this SAS token provider class.

    In the below I have provided how you can set the SAS token provider type in Scala:

    spark.conf.set("fs.azure.sas.token.provider.type.<storage-account-name>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.SimpleSASTokenProvider")
    

    Replace <storage-account-name> with the name of your Azure storage account.

    Using SimpleSASTokenProvider is one approach for generating SAS tokens for ABFSS connectivity from Spark running locally. Depending on your specific requirements and environment, you may also explore other SAS token provider classes provided by Azure JAR files such as:

    1. org.apache.hadoop.fs.azurebfs.sas.EnvironmentVariableSASTokenProvider
    2. org.apache.hadoop.fs.azurebfs.sas.SharedKeySASTokenProvider
    3. org.apache.hadoop.fs.azurebfs.sas.UriSASTokenProvider
    4. org.apache.hadoop.fs.azurebfs.sas.TokenProvider

    Finally

    These are some of the SAS token provider classes provided by Azure JAR files that offer flexibility in generating and managing SAS tokens for accessing Azure storage services like ABFSS (Azure Blob FileSystem Storage). Depending on your use case and requirements, you can choose the appropriate SAS token provider class for your Spark application.

    However, I will advise you to always ensure the followings:

    • Confirm that the necessary JAR files for Azure connectivity are included in the Spark application's classpath. Ensure that the versions of the JAR files (hadoop-azure-<version>.jar, hadoop-azure-datalake-<version>.jar, etc.) are compatible with your Spark version and Azure services.
    • Review the Spark session configuration settings provided by the user to ensure they are correctly set up for ADLS Gen2 connectivity.
    • Confirm that the authentication type is set to use SAS token authentication (fs.azure.account.auth.type.<storage-account-name>.dfs.core.windows.net should be "SAS").
    • Double-check the SAS token provider class (fs.azure.sas.token.provider.type.<storage-account-name>.dfs.core.windows.net) and the SAS token itself (fs.azure.sas.token.fixed.<storage-account-name>.dfs.core.windows.net).

    In addition, after you have done the above, if the class org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider is not found, explore alternative SAS token provider classes as enlists up under the solution, they should be available in the provided JAR files.

    References

    Kindly make use of the additional resources by the right side of this page to get more information and more reading.

    Accept Answer

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.

    ** Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful ** so that others in the community facing similar issues can easily find the solution.

    Best Regards,

    Sina Salam

    0 comments No comments