Synapase Analytics Spark Pool Cannot Access ADLS Gen2 Via Linked Service

Billings, Scott 20 Reputation points
2023-05-04T13:11:06.93+00:00

Hello,

In my Synapse environment, I have a working dedicated pool, ADLS gen2 and working link service. With the linked service, I am able to navigate the different containers and perform "Select TOP 100 Rows" from parquet files. I have recently created a spark pool so that I can begin testing pyspark notebooks.

User's image

If i try selecting "New notebook" and "Load to DataFrame", it runs for 10 minutes before giving me connection timed out error. My Synapse environment is in the same VNET as my ADLSGen2. I have a working managed private endpoint that the linked service is using. Can someone please help me in determining what is causing the connection time out? I tried to log a ticket in the Azure portal but it says that this is an unsupported scenario :(

User's image

User's image

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
{count} votes

Answer accepted by question author
  1. HimanshuSinha 19,547 Reputation points Microsoft Employee Moderator
    2023-05-04T23:32:36.1533333+00:00

    Hello @Billings, Scott ,

    Thanks for the question and using MS Q&A platform.
    I think this is due to the fact that the Apache Spark pool is under the managed VNET which is managed by Synapse. I think this is explained in detail here .

    The Synapse Managed VNet feature provides a fully managed network isolation for the Apache Spark pool and pipeline compute resources between Synapse workspaces. It can be configured at workspace creation time. In addition, it also provides network isolation for Spark clusters within the same workspace. Each workspace has its own virtual network, which is fully managed by Synapse. The Managed VNet isn't visible to the users to make any modifications. Any pipeline or Apache Spark pool compute resources that are spun up by Azure Synapse in a Managed VNet gets provisioned inside its own VNet. This way, there's full network isolation from other workspaces.

    Thanks

    Himanshu

    Please accept as "Yes" if the answer provided is useful , so that you can help others in the community looking for remediation for similar issues. 

    1 person found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Billings, Scott 20 Reputation points
    2023-05-12T16:07:18.7166667+00:00

    I finally got this to work! I followed the instructions in the white paper. I did not have a Azure Synapse Analytics Private link Hub so I created one and added a new private endpoint for my synapse workspace. Once the private endpoint was approved and running, I still could not connect my spark pool to ADLS gen2.

    I decided to try deleting the Managed Private Endpoint that was auto-created for my default storage for synapse. Once deleted, I created a new Managed Private Endpoint and approved it within the Private Link Center. I figured maybe it would work now that I have an azure Synapse Analytics Private Link Hub and it did!

    Thanks for providing this documentation.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.