Goal: Connect client apps (e.g., SSMS, Tableau, Power BI) to the Synapse workspace serverless SQL endpoint and query databases and tables created by an Apache Spark pool in the same Synapse workspace.
Setup:
What we have:
- Active Directory users with Azure Reader role assignment on the Resource group which includes the Synapse workspace, Apache Spark pool, and storage account.
- Inherited Azure Reader role assignment on the Synapse workspace and storage account.
- Firewall rules to allow access from client machines.
- Synapse Administrator Synapse RBAC role assignment to be able to query the information. (This role looks overly powerful. See question below.)
At that point one user got an error:
[Microsoft][ODBC Driver 17 for SQL Server][SQL Server]External table 'dbo' is not accessible because content of directory cannot be listed.
The table "[dbo].[billing_all_prod]" does not exist.
Looking at the document on setting up Synapse access control (https://learn.microsoft.com/en-us/azure/synapse-analytics/security/how-to-set-up-access-control),
we assigned the Storage Blob Data Reader role to the AD user. We haven't heard back yet about whether this resolved all of the access issues, but the end-users (managers for this project) are tired of being used as debuggers, and we would like to be sure everything that needs to be done is completed.
That Synapse access control document, in STEP 7:
By default, all users assigned the Synapse Administrator role are also assigned the SQL db_owner role on the serverless SQL pool, 'Built-in', and all its databases.
Access to SQL pools for other users and for the workspace MSI is controlled using SQL permissions.
When I proceed to STEP 7.1: Serverless SQL pool, Built-in and run the scripts, when trying to create users on the Spark-created databases I get an error:
Operation CREATE USER is not allowed for a replicated database.
That sort of makes sense, since the Spark pool actually created the database and the serverless SQL pool is just using a copy of some metadata. But my questions from above are:
- What is the minimum (most restrictive) Synapse RBAC role that I would need to assign so that a client could use the serverless SQL pool public endpoint to query a database and table created by the Spark pool?
- Is there any SQL-level access that needs to be granted, or can it all be done with various Azure and Synapse RBAC roles? (Again, we have Reader on the resource group, workspace, and storage account, and we also have Storage Blob Data reader on the storage account, and Synapse Administrator at the workspace scope.)
Thanks,
Johnny