Is there a way to collect Synapse's Spark UI logs through an API?

S 5 Reputation points Microsoft Intern
2024-07-05T20:25:12.74+00:00

Is there a way to automate the process to send get request to receive bearer token for this API:

https://{synapse_workspace_name}.dev.azuresynapse.net/sparkhistory/api/v1/sparkpools/{spark_pool_name}/livyid/{livy_id}/applications/{application_id}/1/executors

This is a Spark API for synapse which give the metrics on executor level for a Spark job in Synapse for example in the photo attached:
Screenshot 2024-07-05 131330

I am building a pipeline that extract information like {spark_pool_name}, {livy_id}, and {application_id} for each spark jobs and extract the metrics for each application ID.

Azure API Management
Azure API Management
An Azure service that provides a hybrid, multi-cloud management platform for APIs.
2,177 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,005 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Sina Salam 12,086 Reputation points
    2024-07-05T21:46:24.11+00:00

    Hello S,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    Problem

    I understand that you would like to collect Synapse's Spark UI logs through an API and automate the process of obtaining a bearer token for the Synapse Spark API.

    Solution

    To solve the challenges, you have number of options.

    Collecting Synapse's Spark UI logs through an API,

    you can do the followings:

    1. You can enable the Synapse Studio connector built into Log Analytics. This allows you to collect and send Apache Spark application metrics and logs to your Log Analytics workspace.
      1. By creating a Log Analytics workspace, you can do this via the Azure portal, Azure CLI, or PowerShell.
      2. Prepare an Apache Spark configuration file with the necessary parameters:
              spark.synapse.logAnalytics.enabled true       
              spark.synapse.logAnalytics.workspaceId <LOG_ANALYTICS_WORKSPACE_ID>       spark.synapse.logAnalytics.secret <LOG_ANALYTICS_WORKSPACE_KEY>
        
        Replace parameters with your actual values.
      3. Configure the workspace information in Synapse Studio.
      4. Submit your Apache Spark application, and the logs and metrics will be sent to your Log Analytics workspace.
      5. Visualize the metrics and logs using an Azure Monitor workbook. link: https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-log-analytics
    2. You can download the completed application log via a curl command. For the driver log, use bash for direct API Request as shown below:
    curl "https://{workspace}.dev.azuresynapse.net/sparkhistory/api/v1/sparkpools/{sparkPool}/livyid/{livyId}/applications/{appId}/driverlog/stderr/?isDownload=true" -H "authorization:Bearer {AccessToken}"
    
    You would have to replace `{workspace}`, `{sparkPool}`, `{livyId}`, `{appId}`, and `{AccessToken}` with your actual values. Also, remember to choose the approach that best fits your requirements and workflow. Similar answer on Q&A: [https://learn.microsoft.com/en-us/answers/questions/253744/synapse-spark-logs. ](https://learn.microsoft.com/en-us/answers/questions/253744/synapse-spark-logs.
    

    )

    Now,

    Automate the process of obtaining a bearer token for the Synapse Spark API

    1. You'll need to obtain a bearer token to authenticate your requests. The token is a lightweight security token that grants access to protected resources. You can use PowerShell to get the bearer token as it's shown below:
      1. Azure Management Endpoint (Workspace): $token = (Get-AzAccessToken -Resource "https://management.azure.com").Token
      2. Synapse DEV Endpoint (Workspace Resources): $token = (Get-AzAccessToken -Resource "https://dev.azuresynapse.net").Token
      3. Make sure you're using the correct endpoint based on your use case. Link: https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/calling-synapse-rest-api-to-automate-tasks-using-powershell/ba-p/2202814.
    2. Now that you have the bearer token, you can construct your GET request to the Spark API. By replacing {synapse_workspace_name}, {spark_pool_name}, {livy_id}, and {application_id} with the actual values from your pipeline as an example API URL you gave in the question:
         https://{synapse_workspace_name}.dev.azuresynapse.net/sparkhistory/api/v1/sparkpools/{spark_pool_name}/livyid/{livy_id}/applications/{application_id}/1/executors
      
    3. Lastly, using the constructed URL to retrieve executor-level metrics for your Spark job in Synapse.

    Accept Answer

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.

    ** Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful ** so that others in the community facing similar issues can easily find the solution.

    Best Regards,

    Sina Salam

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.