Azure Synapse Analytics - Connect to Serverless SQL Pool via Spark using Managed Identity

Question

Azure Synapse Analytics - Connect to Serverless SQL Pool via Spark using Managed Identity

Graeme Cash 76

Hello Is it possible to connect to the serverless sql pool via Spark in Synapse using managed identity? I need to run queries against the serverless pool and have tried pyodbc. I can run queries using a SQL logon but when I try managed identity I get a timeout... OperationalError: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 18 for SQL Server]Login timeout expired (0) (SQLDriverConnect)')

driver = '{ODBC Driver 18 for SQL Server}'

cnxn = pyodbc.connect(f'Driver={driver};Server={server};Database={database};Port=1433;Authentication=ActiveDirectoryMsi')

I have also configured the Spark session to run as managed identity... User's image Thanks in advance

Anonymous

2024-02-06T09:39:29.2533333+00:00
@Graeme Cash

Thanks for reaching out to Microsoft Q&A.

**
While it's not directly possible to connect to the serverless SQL pool via Spark in Synapse using managed identity with pyodbc, here are alternative approaches and troubleshooting steps:

1. Use the Azure Synapse Dedicated SQL Pool Connector for Apache Spark:

This connector is designed for secure and efficient data transfer between Spark and Dedicated SQL Pools.

It also supports managed identity authentication, eliminating the need for passwords or connection strings.

Although primarily for Dedicated SQL Pools, it can often be used with serverless SQL pools as well.

Steps:

Install the connector: pip install synapse-dedicated-sqlpool-connector

Configure the Spark session with managed identity:

Python

spark.conf.set("spark.sql.connector.synapse.sql.option.enableSystemTokenAuth", "true")

Use code with caution. Learn more

content_copy

Create a DataFrame using the connector:

Python

df = spark.read.format("com.microsoft.spark.sqlanalytics.connector.SynapseSQLPoolConnector").option("url", server).option("database", database).load()

Use code with caution. Learn more

content_copy

2. Use mssparkutils for token-based authentication:

Retrieve an access token using mssparkutils.credentials.getConnectionStringOrCreds.

Construct the connection string with the token.

Steps:

Retrieve the connection string:

Python

conn_str = mssparkutils.credentials.getConnectionStringOrCreds(linked_service="<your-linked-service-name>")

Use code with caution. Learn more

content_copy

Connect using pyodbc with the token:

Python

cnxn = pyodbc.connect(conn_str)

Use code with caution. Learn more

content_copy

3. Troubleshooting the timeout error:

Network connectivity: Ensure stable network connectivity between Spark and the serverless SQL pool.

Firewall rules: Verify that firewall rules allow traffic from Spark to the serverless SQL pool.

Managed identity permissions: Confirm that the managed identity has the necessary permissions to access the serverless SQL pool.

Driver version: Consider trying different ODBC driver versions.

Timeout settings: Adjust timeout settings in the connection string if needed.

If using a private endpoint for the serverless SQL pool, ensure Spark is configured to use the private endpoint.

For complex scenarios, consider using Azure Data Factory, which supports managed identity connections to serverless SQL pools.

Hope this helps. Do let us know if you any further queries.
Anonymous

2024-02-07T10:39:46.7466667+00:00

@Graeme Cash We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
Anonymous

2024-02-08T09:48:12.1666667+00:00

@Graeme Cash just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer accepted by question author

1 additional answer

Your answer

Anonymous

2024-02-07T10:39:46.7466667+00:00

@Graeme Cash We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
Anonymous

2024-02-08T09:48:12.1666667+00:00

@Graeme Cash just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 1

@Graeme Cash
Welcome to Microsoft Q&A platform and thanks for posting your question. I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others "I'll repost your solution in case you'd like to accept the answer.

Ask: Hello Is it possible to connect to the serverless sql pool via Spark in Synapse using managed identity? I need to run queries against the serverless pool and have tried pyodbc. I can run queries using a SQL logon but when I try managed identity I get a timeout... OperationalError: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 18 for SQL Server]Login timeout expired (0) (SQLDriverConnect)')

driver = '{ODBC Driver 18 for SQL Server}'

cnxn = pyodbc.connect(f'Driver={driver};Server={server};Database={database};Port=1433;Authentication=ActiveDirectoryMsi')

I have also configured the Spark session to run as managed identity...

Solution: Hello, I have managed to get it working with pyodbc by passing the connection string from the linked service to the pyodbc.connect using attrs_before...

token = mssparkutils.credentials.getConnectionStringOrCreds(linkedService=linked_service)

exptoken = b""
for i in token:
	exptoken += bytes(i,'utf-8');
    exptoken += bytes(1);
tokenstruct = struct.pack("=i", len(exptoken)) + exptoken;

pyodbc.connect(f'Driver={DRIVER};Server={server};Database={database};Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;', attrs_before = { 1256:tokenstruct })

Graeme.

If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information. If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.

---Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

Anonymous

2024-02-12T05:02:38.2+00:00

@Graeme Cash
Following up to see if the above answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 2

Graeme Cash 76

Hello I have managed to get it working with pyodbc by passing the connection string from the linked service to the pyodbc.connect using attrs_before...

token = mssparkutils.credentials.getConnectionStringOrCreds(linkedService=linked_service)

exptoken = b""
for i in token:
	exptoken += bytes(i,'utf-8');
    exptoken += bytes(1);
tokenstruct = struct.pack("=i", len(exptoken)) + exptoken;

pyodbc.connect(f'Driver={DRIVER};Server={server};Database={database};Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;', attrs_before = { 1256:tokenstruct })

Graeme

Anonymous

2024-02-09T05:04:37.34+00:00

@Graeme Cash Glad to know your issue has been resolved. Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others "I'll repost your solution in case you'd like to accept the answer.

Share via

Azure Synapse Analytics - Connect to Serverless SQL Pool via Spark using Managed Identity

1 additional answer

Your answer