How do find the user/principal submitting my Spark Application?
I've been trying to track down why I can't submit an application/run a Synapse Notebook, but some guid can. In the monitor section of Azure Synapse Workspace. I click on Apache Spark Applications and I see a column for submitter. I see my username which fails, but I also see a submitted described as a guid. How do I figure out the service principal/user associated with this guid? Also how specify which user should be used to submit an application when I run a notebook via a pipeline? Currently all pipelines submit applications using a default guid.
Azure Synapse Analytics
-
phemanth 10,325 Reputation points • Microsoft Vendor
2023-11-14T12:16:36.2433333+00:00 Thanks for reaching out to Microsoft Q&A.
- you can use the Azure CLI or Azure PowerShell. Here are the steps to do this using Azure CLI:
- Open Azure CLI and run the following command to get the details of the service principal associated with the GUID:
az ad sp show --id <GUID>
- Replace
<GUID>
with the GUID you want to check. - This command will return the details of the service principal, including the display name, object ID, and other information.
To specify which user should be used to submit an application when you run a notebook via a pipeline, you can use the
spark.conf
configuration setting in the notebook. Here are the steps to do this:- Open the notebook you want to run in Azure Synapse Studio.
- Click on the "Settings" button in the toolbar and select "Advanced Settings".
- In the "spark.conf" section, add the following configuration setting:
spark.databricks.servicePrincipalId <service_principal_id>
- Replace
<service_principal_id>
with the object ID of the service principal you want to use to submit the application. - Save the notebook and run it via the pipeline. The application will be submitted using the specified service principal.
I hope this helps! please do Let us know if you have any further questions.
-
Jeff Born (J&CLT-ATL) 101 Reputation points
2023-11-14T18:13:59.64+00:00 I still can't track down who/what is submitting my Apache Spark Applications. It makes sense that if I run the notebook, that I'm the submitter. When I run the notebook via a pipeline the submitter is a GUID.
When I try to run the az command I'm getting the the following:
az ad sp show --id xxx Resource 'xxx' does not exist or one of its queried reference-property objects are not present.
Is this GUID not a service principal? This submitter GUID that I get via a pipeline is the same whether I'm in my non prod, QA, or production environment.
-
phemanth 10,325 Reputation points • Microsoft Vendor
2023-11-15T06:41:18.2666667+00:00 If the GUID you are seeing in the submitter column of the Apache Spark Applications is not associated with a service principal, it is possible that it is associated with a user account or a system account.
To find out more information about the GUID, you can try using the Azure AD Graph API or Microsoft Graph API to retrieve the details of the object associated with the GUID. Here are the steps to do this using the Azure AD Graph API:
Open the Azure AD Graph Explorer at https://graphexplorer.azurewebsites.net/.
Sign in with your Azure AD credentials.
Run the following query to retrieve the details of the object associated with the GUID:
https://graph.windows.net/<tenant_id>/directoryObjects/<object_id>?api-version=1.6
Replace
<tenant_id>
with the ID of your Azure AD tenant and<object_id>
with the GUID you want to check.This query will return the details of the object associated with the GUID, including the object type, display name, and other information.
If the GUID is associated with a user account, you can try contacting the user to find out if they submitted the Apache Spark Application. If the GUID is associated with a system account, it may be more difficult to determine who or what submitted the application.
I hope this helps! please do Let us know if you have any further questions.
-
phemanth 10,325 Reputation points • Microsoft Vendor
2023-11-16T14:49:29.4233333+00:00 @Jeff Born (J&CLT-ATL) We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
-
gCW1886 21 Reputation points
2024-02-07T15:57:04.56+00:00 AD Graph Explorer requires AD admin permission... Is there no other way of figuring out which identity submitted the spark application for synapse admins?
-
phemanth 10,325 Reputation points • Microsoft Vendor
2024-02-08T10:16:23.2966667+00:00 @Jeff Born (J&CLT-ATL) If you don’t have Azure AD admin permissions, it can be challenging to directly identify the user or service principal associated with a GUID. However, there are a few other ways you might be able to gather more information:
- Check the Azure Synapse Analytics workspace settings: The GUID might be related to the Managed Service Identity (MSI) of the Azure Synapse Analytics workspace. When a new Azure Synapse Analytics workspace is created, a service principal with the same name as the resource is automatically created. This identity is used for Synapse Pipelines. You can check the details of this service principal in the Azure portal.
- Use the mssparkutils package: Azure Synapse Analytics provides a package called mssparkutils that simplifies the process of retrieving tokens, connection strings, and secrets stored in a linked service or from an Azure Key Vault1. You can use this package to retrieve the details of the linked service associated with the GUID.
- Contact your Azure AD admin: If you’re unable to identify the GUID using the above methods, you might need to contact your Azure AD admin. They should have the necessary permissions to look up the GUID and provide you with the associated user or service principal.
I hope this helps! Please let me know if you have any further questions.
-
gCW1886 21 Reputation points
2024-02-08T13:07:20.0533333+00:00 Thank you for the fast reply (and sorry to capture this thread)!
1.) checked both MSI and users (I know who actually triggered it, it was me by means of manual trigger of a Dataflow btw., but it does show this strange GUID as submitter)
2.) mssparkutils returns (mssparkutils.credentials.getPropertiesAll(<LinkedService>)) an empty 'Id'
3.) will try in the long run what I also tried:
4.) getting ID for synapse components via API calls (https://learn.microsoft.com/en-us/rest/api/synapse/data-plane/integration-runtimes/get?view=rest-synapse-data-plane-2020-12-01&tabs=HTTP) -> only get etags, found no objectIDs unfortunately
5.) powershell (https://learn.microsoft.com/en-us/powershell/module/azuread/get-azureadobjectbyobjectid?view=azureadps-2.0) -> no match (working fine for e.g. workspace MSI objectID)My last question: If the submitter column contains an ID, is this guarantied to be any specific type of GUID e.g. objectID?
-
phemanth 10,325 Reputation points • Microsoft Vendor
2024-02-09T09:55:45.2066667+00:00 @Jeff Born (J&CLT-ATL)
The GUID in the submitter column of the Apache Spark Applications in Azure Synapse Workspace is not guaranteed to be any specific type of GUID such as an object ID. It could be associated with a service principal, a user account, or a system account.If you’re seeing a GUID that doesn’t match any known service principal or user account, it could be a system-generated GUID. For example, when you manually trigger a Dataflow, Azure might generate a GUID to track the operation.
Also, the GUID could be a uniqueidentifier, which is a 16-byte GUID3. This type of GUID can be initialized using the NEWID or NEWSEQUENTIALID functions, or by converting from a string constant.
I hope this helps! Let us know if you have any further questions.
Sign in to comment