Is it possible to run Azure Synapse Pipelines with user impersonation?

Miklós Mezei 0 Reputation points
2024-05-09T07:16:35.0233333+00:00

We would like to run analysis scripts (Azure synapse pipeline) on user defined data (Azure storage), but we don't want to bother to get access to every possible storage account on our end. For this it would be beneficial to just use user impersonation and an API (developed by us) that controls the flow, but i can't seems to find any solution to run a predefined synapse pipeline with user impersonation.

Is it even possible to call the synapse pipeline api with user impersonation in scope?
If yes, which API permission I need for the app so this can work?
If not, is there any possible way to run a python/pyspark script on Azure Storage data with a delegated access?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,445 questions
Azure App Service
Azure App Service
Azure App Service is a service used to create and deploy scalable, mission-critical web apps.
6,988 questions
{count} votes

1 answer

Sort by: Most helpful
  1. phemanth 6,550 Reputation points Microsoft Vendor
    2024-05-09T10:05:10.4366667+00:00

    @Miklós Mezei

    Thanks for using MS Q&A platform and posting your query.

    No, directly calling Synapse Pipeline API with user impersonation isn't possible. However, there are alternative approaches to achieve your goal of running analysis scripts on user data in Azure Storage with controlled access:

    1. Managed Identity with Synapse Workspace:

    • Grant access to specific storage containers/blobs to a Synapse Workspace managed identity.
    • Develop your Synapse pipeline using Synapse SQL or other supported languages to access data from the authorized storage locations.
    • Trigger the Synapse pipeline using Azure Data Factory (ADF). Configure ADF to use a separate managed identity with access to trigger the Synapse pipeline.
    • Develop an API (hosted on Azure Functions or App Service) that interacts with ADF. This API can impersonate users using Azure Active Directory (AAD) tokens and trigger the ADF pipeline running the Synapse script on user data.

    2. SAS Tokens with Synapse Workspace:

    • Generate Shared Access Signatures (SAS) tokens for user storage containers/blobs with appropriate permissions.
    • Develop your Synapse pipeline to access data using the SAS tokens.
    • Implement a secure mechanism in your API to generate and manage SAS tokens with controlled access expiration.
    • Trigger the Synapse pipeline similar to approach 1 using ADF with a managed identity.

    3. Azure Functions with Storage Access:

    • Develop Python/PySpark script as an Azure Function.
    • Grant the Azure Function access to specific storage containers/blobs using a managed identity.
    • Trigger the Azure Function using your API with user impersonation (similar to approach 1).
    • The Azure Function can directly access and process data from user storage using its managed identity.

    Permissions:

    • API App: Requires Azure AD access to impersonate users and trigger ADF/Azure Functions (depending on chosen approach).
    • ADF/Azure Function: Needs managed identity with access to Synapse Workspace (for approach 1) or storage access (for approach 2, 3).
    • Synapse Workspace (if used): Managed identity with access to authorized storage locations (for approach 1).

    Hope this helps. Do let us know if you any further queries.

    1 person found this answer helpful.
    0 comments No comments