Trouble when accessing ADLS Gen2 storage with linked services/TokenLibrary in notebook

Binhan Xi 20 Reputation points Microsoft Employee
2024-09-25T02:57:30.1133333+00:00

Hello,

I have a Synapse workspace notebook that is reading data from ADLS Gen2. I have created a linked service in Synapse workspace to ADLS Gen2 using SPI + certificate. However, when I tried to do authentication in my notebook following the documentation in this link, my notebook ran for a long time and finally failed with a Py4JJavaError. The error seems to be due to an HTTP read timeout, and I suspect it could be related to the network or an HTTP timeout configuration. I'm not sure where I can configure the network or anything related.

Could someone help me understand and resolve this issue?

Py4JJavaError: An error occurred while calling o4168.load. : java.util.concurrent.ExecutionException: Status code: -1 error code: null error message: Auth failure: HTTP Error -1CustomTokenProvider getAccessToken threw java.io.IOException : Read timed outorg.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator$HttpException: HTTP Error -1CustomTokenProvider getAccessToken threw java.io.IOException : Read timed out   at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) ......................................... at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)   ... 37 more Caused by: org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator$HttpException: HTTP Error -1CustomTokenProvider getAccessToken threw java.io.IOException : Read timed out    at ... 136 more

Here is the docUser's image

Here is the code that I am using:

from delta.tables import DeltaTable
 
input_storage_account_name = "mssalesfdlakeprod.dfs.core.windows.net"
spark.conf.set(f"spark.storage.synapse.{input_storage_account_name}.linkedServiceName", "MSSalesFDLProd")
sc._jsc.hadoopConfiguration().set(f"fs.azure.account.oauth.provider.type.{input_storage_account_name}", "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedTokenProvider")
 
input_path = "abfss://securezone@mssalesfdlakeprod.dfs.core.windows.net/Domain/dbo.BillingStatus/"
 
df = spark.read.format("delta").load(input_path)

display(df)

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,927 questions
{count} votes

Accepted answer
  1. Smaran Thoomu 16,005 Reputation points Microsoft Vendor
    2024-09-26T08:36:05.66+00:00

    Hi @Binhan Xi
    I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others "I'll repost your solution in case you'd like to accept the answer.
    Ask: I have a Synapse workspace notebook that is reading data from ADLS Gen2. I have created a linked service in Synapse workspace to ADLS Gen2 using SPI + certificate. However, when I tried to do authentication in my notebook following the documentation in this link, my notebook ran for a long time and finally failed with a Py4JJavaError. The error seems to be due to an HTTP read timeout, and I suspect it could be related to the network or an HTTP timeout configuration. I'm not sure where I can configure the network or anything related.

    Could someone help me understand and resolve this issue?

    Py4JJavaError: An error occurred while calling o4168.load. : java.util.concurrent.ExecutionException: Status code: -1 error code: null error message: Auth failure: HTTP Error -1CustomTokenProvider getAccessToken threw java.io.IOException : Read timed outorg.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator$HttpException: HTTP Error -1CustomTokenProvider getAccessToken threw java.io.IOException : Read timed out   at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) ......................................... at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)   ... 37 more Caused by: org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator$HttpException: HTTP Error -1CustomTokenProvider getAccessToken threw java.io.IOException : Read timed out    at ... 136 more

    Here is the docUser's image

    Here is the code that I am using:

    from delta.tables import DeltaTable
     
    input_storage_account_name = "mssalesfdlakeprod.dfs.core.windows.net"
    spark.conf.set(f"spark.storage.synapse.{input_storage_account_name}.linkedServiceName", "MSSalesFDLProd")
    sc._jsc.hadoopConfiguration().set(f"fs.azure.account.oauth.provider.type.{input_storage_account_name}", "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedTokenProvider")
     
    input_path = "abfss://securezone@mssalesfdlakeprod.dfs.core.windows.net/Domain/dbo.BillingStatus/"
     
    df = spark.read.format("delta").load(input_path)
    
    display(df)
    
    

    Solution: Unfortunately, increasing the timeout settings does not work for me. However, I found a method to temporarily mitigate the issue: every time I started a new spark session, I need to modify my linked service (say, change the authentication from SPI + cert to SPI + secret), publish it, change it back, and publish it again. And then the spark notebook runs successfully.

    I think this is so strange, and this cannot be a good solution because finally we will automatically run the notebook to work as a scheduled pipeline for moving data. My current solution does not work in that situation.

    BTW, I think the fact that the notebook can run normally after my linked service change should prove that the linked service using SPI + cert itself should not have problems.

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

    If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.