You're encountering the error Network is unreachable
when trying to access an external REST API using requests.get()
in an Azure Synapse Spark notebook.
Your Synapse workspace is configured with a Managed Virtual Network and Data Exfiltration Protection (DEP) enabled. In this setup, all outbound internet access is blocked by default, including calls to external APIs. This restriction is a security feature to prevent data exfiltration. As a result, direct calls from notebooks or pipelines to public endpoints like
https://pokeapi.co/api/v2/pokemon
will fail. Reference: https://learn.microsoft.com/en-us/azure/synapse-analytics/policy-reference
To access external REST APIs from Synapse in this configuration, the supported and recommended approach is to use a Self-Hosted Integration Runtime (SHIR).
Deploy SHIR - Set up a Self-Hosted Integration Runtime on an Azure VM (or on-prem machine) that has internet access.
Register SHIR - Connect the SHIR to your Synapse workspace.
Create a REST Linked Service - In Synapse Pipelines, create a REST linked service and associate it with the SHIR.
Use a Pipeline or Web Activity - Use a Web or Copy activity in your pipeline to make the API call via the SHIR. Store the response (e.g., in Azure Data Lake Storage or Blob Storage) for further processing in Spark notebooks.
For more details: https://learn.microsoft.com/en-us/azure/synapse-analytics/policy-reference
https://learn.microsoft.com/en-us/azure/synapse-analytics/security/connectivity-settings
I hope this information helps.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.