Events
Mar 31, 11 PM - Apr 2, 11 PM
The ultimate Microsoft Fabric, Power BI, SQL, and AI community-led event. March 31 to April 2, 2025.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
In this tutorial, you will learn how to deploy the Apache Spark application metrics solution to an Azure Kubernetes Service (AKS) cluster and learn how to integrate the Grafana dashboards.
You can use this solution to collect and query the Apache Spark metrics data near real time. The integrated Grafana dashboards allow you to diagnose and monitor your Apache Spark application. The source code and the configurations have been open-sourced on GitHub.
Or use the Azure Cloud Shell, which already includes the Azure CLI, Helm client and kubectl out of the box.
az login
az account set --subscription "<subscription_id>"
Use the Azure CLI command to create a Kubernetes cluster in your subscription.
az aks create --name <kubernetes_name> --resource-group <kubernetes_resource_group> --location <location> --node-vm-size Standard_D2s_v3
az aks get-credentials --name <kubernetes_name> --resource-group <kubernetes_resource_group>
Note: This step can be skipped if you already have an AKS cluster.
az ad sp create-for-rbac --name <service_principal_name> --role Contributor --scopes /subscriptions/<subscription_id>
The result should look like:
{
"appId": "abcdef...",
"displayName": "<service_principal_name>",
"name": "http://<service_principal_name>",
"password": "abc....",
"tenant": "<tenant_id>"
}
Note down the appId, password, and tenantID.
Log in to your Azure Synapse Analytics workspace as Synapse Administrator
In Synapse Studio, on the left-side pane, select Manage > Access control
Click the Add button on the upper left to add a role assignment
For Scope, choose Workspace
For Role, choose Synapse Compute Operator
For Select user, input your <service_principal_name> and click your service principal
Click Apply (Wait 3 minutes for permission to take effect.)
Note
Please make sure your service principal is at least "Reader" role in your Synapse workspace. Go to Access Control (IAM) tab of the Azure portal and check the permission settings.
helm repo add synapse-charts https://github.com/microsoft/azure-synapse-spark-metrics/releases/download/helm-chart
helm install spo synapse-charts/synapse-prometheus-operator --create-namespace --namespace spo \
--set synapse.workspaces[0].workspace_name="<workspace_name>" \
--set synapse.workspaces[0].tenant_id="<tenant_id>" \
--set synapse.workspaces[0].service_principal_name="<service_principal_app_id>" \
--set synapse.workspaces[0].service_principal_password="<service_principal_password>" \
--set synapse.workspaces[0].subscription_id="<subscription_id>" \
--set synapse.workspaces[0].resource_group="<workspace_resource_group_name>"
Get the default password and address of Grafana. You may change the password in the Grafana settings.
kubectl get secret --namespace spo spo-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
kubectl -n spo get svc spo-grafana
Get service ip, copy & paste the external ip to browser, and login with username "admin" and the password.
Find Synapse Dashboard on the upper left corner of the Grafana page (Home -> Synapse Workspace / Synapse Application), try to run an example code in Synapse Studio and wait a few seconds for the metrics pulling.
Also, you can use the "Synapse Workspace / Workspace" and "Synapse Workspace / Apache Spark pools" dashboards to get an overview of your workspace and your Apache Spark pools.
Remove the components by Helm command as follows.
helm delete <release_name> -n <namespace>
Delete the AKS cluster.
az aks delete --name <kubernetes_cluster_name> --resource-group <kubernetes_cluster_rg>
Azure Synapse Analytics provides a Helm chart based on Prometheus Operator and Synapse Prometheus Connector. The Helm chart includes Prometheus server, Grafana server, and Grafana dashboards for Apache Spark application-level metrics. You can use Prometheus, a popular open-source monitoring system, to collect these metrics in near real-time and use Grafana for visualization.
Synapse Prometheus Connector helps to connect Azure Synapse Apache Spark pool and your Prometheus server. It implements:
Synapse Prometheus Connector is released as a docker image hosted on Microsoft Container Registry. It is open-source and is located in Azure Synapse Apache Spark application metrics.
Prometheus is an open-source monitoring and alerting toolkit. Prometheus graduated from the Cloud Native Computing Foundation (CNCF) and became the de facto standard for cloud-native monitoring. Prometheus can help us collect, query, and store massive amounts of time series data, and it can be easily integrated with Grafana. In this solution, we deploy the Prometheus component based on the helm chart.
Grafana is open-source visualization and analytics software. It allows you to query, visualize, alert on, and explore your metrics. Azure Synapse Analytics provides a set of default Grafana dashboards to visualize Apache Spark application-level metrics.
The "Synapse Workspace / Workspace" dashboard provides a workspace level view of all the Apache Spark pools, application counts, cpu cores, etc.
The "Synapse Workspace / Apache Spark pools" dashboard contains the metrics of Apache Spark applications running in the selected Apache Spark pool during the time period.
The "Synapse Workspace / Apache Spark Application" dashboard contains the selected Apache Spark application.
The above dashboard templates have been open-sourced in Azure Synapse Apache Spark application metrics.
Events
Mar 31, 11 PM - Apr 2, 11 PM
The ultimate Microsoft Fabric, Power BI, SQL, and AI community-led event. March 31 to April 2, 2025.
Register today