Thanks for using MS Q&A platform and posting your query.
The errors you're encountering in both the SQL and Python code point towards access issues with your Azure Data Lake Storage (ADLS) Gen 2 account. Here's a breakdown of the problems and how to fix them:
Error Analysis:
Outbound Firewall Rules: The error message in the SQL code indicates that the Synapse SQL Pool doesn't have outbound access to the storage account you specified. This is because the storage account isn't included in the list of allowed outbound firewall rules for the server.
Slow and Failing Spark Job: The slow performance and eventual failure in the Spark notebook might be due to the same access issue or potential issues with data size or processing within Spark itself.
Solutions:
Configure Outbound Firewall Rule:
- Access the Azure portal and navigate to your Synapse Workspace.
- Go to Workspace settings > Security.
- Under Outbound firewall, click Add rule.
- Configure a new rule with the following details:
- Name: Give it a descriptive name like "ADLS Gen 2 Access"
- Priority: Choose a high priority
- Destination Port Range: Leave blank
- Source: Select the IP address range for Azure Synapse Analytics (you can find this range in the documentation https://learn.microsoft.com/en-us/azure/synapse-analytics/security/gateway-ip-addresses)
- Destination: Enter the public endpoint of your ADLS Gen 2 storage account (e.g.,
<storageaccount>.dfs.core.windows.net
). - Save the rule.
Spark Notebook Issue:
- The slowness and eventual failure when reading data using Spark might be related to the firewall issue or the way you're defining the location in the Spark code. Here's what to check:
- Firewall: Ensure the firewall allows access as mentioned above.
- Location: In the Spark code (
abfss://<container>@<storageaccount>.dfs.core.windows.net/test.parquet
), you're usingabfss
instead ofabfs
. Update the location string toabfs://<container>@<storageaccount>.dfs.core.windows.net/test.parquet
.
SQL Script Issues:
- The provided SQL script seems to be unrelated to the current problem. It defines a master key and a database scoped credential, which might be used for other purposes but aren't necessary for creating the external table.
Here's how to fix the external table creation:
- Double-check the location path in your
CREATE EXTERNAL TABLE
statement. Ensure the folder "/test" exists in your ADLS Gen2 storage account container. - Verify that the
WorkspaceIdentity
credential has the necessary permissions to access the storage account.
By resolving the firewall restrictions and potentially fixing the location path, you should be able to create the external table successfully and read data from your ADLS Gen2 storage using both methods.
Hope this helps. Do let us know if you any further queries.