Events
31 Mar, 23 - 2 Apr, 23
The biggest Fabric, Power BI, and SQL learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
The Fabric Apache Spark diagnostic emitter extension is a library that enables Apache Spark applications to emit logs, event logs, and metrics to multiple destinations, including Azure log analytics, Azure storage, and Azure event hubs.
In this tutorial, you learn how to configure and emit Spark logs and metrics to Log analytics in Fabric. Once configured, you are able to collect and analyze Apache Spark application metrics and logs in your Log analytics workspace.
Follow these steps to configure the necessary information in Fabric.
Consult one of the following resources to create this workspace:
To configure Spark, create a Fabric Environment Artifact and choose one of the following options:
Create a Fabric Environment Artifact in Fabric
Add the following Spark properties with the appropriate values to the environment artifact, or select Add from .yml in the ribbon to download the sample yaml file, which already containing the required properties.
<LOG_ANALYTICS_WORKSPACE_ID>
: Log Analytics workspace ID.<LOG_ANALYTICS_WORKSPACE_KEY>
: Log Analytics key. To find this, in the Azure portal, go to Azure Log Analytics workspace > Agents > Primary key.spark.synapse.diagnostic.emitters: LA
spark.synapse.diagnostic.emitter.LA.type: "AzureLogAnalytics"
spark.synapse.diagnostic.emitter.LA.categories: "Log,EventLog,Metrics"
spark.synapse.diagnostic.emitter.LA.workspaceId: <LOG_ANALYTICS_WORKSPACE_ID>
spark.synapse.diagnostic.emitter.LA.secret: <LOG_ANALYTICS_WORKSPACE_KEY>
spark.fabric.pools.skipStarterPools: "true" //Add this Spark property when using the default pool.
Alternatively, to apply the same configuration as Azure Synapse, use the following properties, or select Add from .yml in the ribbon to download the sample yaml file.
spark.synapse.logAnalytics.enabled: "true"
spark.synapse.logAnalytics.workspaceId: <LOG_ANALYTICS_WORKSPACE_ID>
spark.synapse.logAnalytics.secret: <LOG_ANALYTICS_WORKSPACE_KEY>
spark.fabric.pools.skipStarterPools: "true" //Add this Spark property when using the default pool.
Save and publish changes.
Note
Known issue: Unable to start a session using Option 2 provisionally. Currently, storing secrets in Key Vault prevents Spark sessions from starting. Please prioritize configuring it using the method outlined in Option 1.
You need to grant read secret permission to the users who will submit Apache Spark applications. For more information, see Provide access to Key Vault keys, certificates, and secrets with an Azure role-based access control.
To configure Azure Key Vault to store the workspace key, follow these steps:
Go to your Key Vault in the Azure portal.
On the settings page for the key vault, select Secrets, then Generate/Import.
On the Create a secret screen, Enter the following values:
SparkLogAnalyticsSecret
.<LOG_ANALYTICS_WORKSPACE_KEY>
for the secret.Create a Fabric Environment Artifact in Fabric
Add the following Spark properties with the corresponding values to the environment artifact, or Select Add from .yml on the ribbon in the Environment artifact to download the sample yaml file which includes following Spark properties.
<LOG_ANALYTICS_WORKSPACE_ID>
: The Log Analytics workspace ID.<AZURE_KEY_VAULT_NAME>
: The key vault name that you configured.<AZURE_KEY_VAULT_SECRET_KEY_NAME>
(optional): The secret name in the key vault for the workspace key. The default is SparkLogAnalyticsSecret
.// Spark properties for LA
spark.synapse.diagnostic.emitters LA
spark.synapse.diagnostic.emitter.LA.type: "AzureLogAnalytics"
spark.synapse.diagnostic.emitter.LA.categories: "Log,EventLog,Metrics"
spark.synapse.diagnostic.emitter.LA.workspaceId: <LOG_ANALYTICS_WORKSPACE_ID>
spark.synapse.diagnostic.emitter.LA.secret.keyVault: <AZURE_KEY_VAULT_NAME>
spark.synapse.diagnostic.emitter.LA.secret.keyVault.secretName: <AZURE_KEY_VAULT_SECRET_KEY_NAME>
spark.fabric.pools.skipStarterPools: "true" //Add this Spark property when using the default pool.
Alternatively, to apply the same configuration as Azure Synapse, use the following properties, or select Add from .yml in the ribbon to download the sample yaml file.
spark.synapse.logAnalytics.enabled: "true"
spark.synapse.logAnalytics.workspaceId: <LOG_ANALYTICS_WORKSPACE_ID>
spark.synapse.logAnalytics.keyVault.name: <AZURE_KEY_VAULT_NAME>
spark.synapse.logAnalytics.keyVault.key.secret: <AZURE_KEY_VAULT_SECRET_KEY_NAME>
spark.fabric.pools.skipStarterPools: "true" //Add this Spark property when using the default pool.
Note
You can also store the workspace ID in Key Vault. Set the secret name to SparkLogAnalyticsWorkspaceId
, or use the configuration spark.synapse.logAnalytics.keyVault.key.workspaceId
to specify the workspace ID secret name.
For a list of Apache Spark configurations, see Available Apache Spark configurations
Save and publish changes.
To attach the environment to notebooks or Spark job definitions:
To set the environment as the workspace default:
Note
Only workspace admins can manage configurations. The values will apply to notebooks and Spark job definitions that attach to Workspace Settings. For more details, see Fabric Workspace Settings.
To submit an Apache Spark application:
Submit an Apache Spark application, with the associated environment, which was configured in the previous step. You can use any of the following ways to do so:
Go to the specified Log Analytics workspace, and then view the application metrics and logs when the Apache Spark application starts to run.
You can use the Apache Log4j library to write custom logs. Here are examples for Scala and PySpark:
Scala Example:
%%spark
val logger = org.apache.log4j.LogManager.getLogger("com.contoso.LoggerExample")
logger.info("info message")
logger.warn("warn message")
logger.error("error message")
//log exception
try {
1/0
} catch {
case e:Exception =>logger.warn("Exception", e)
}
// run job for task level metrics
val data = sc.parallelize(Seq(1,2,3,4)).toDF().count()
PySpark Example:
%%pyspark
logger = sc._jvm.org.apache.log4j.LogManager.getLogger("com.contoso.PythonLoggerExample")
logger.info("info message")
logger.warn("warn message")
logger.error("error message")
To query Apache Spark events:
SparkListenerEvent_CL
| where fabricWorkspaceId_g == "{FabricWorkspaceId}" and artifactId_g == "{ArtifactId}" and fabricLivyId_g == "{LivyId}"
| order by TimeGenerated desc
| limit 100
To query Spark application driver and executor logs:
SparkLoggingEvent_CL
| where fabricWorkspaceId_g == "{FabricWorkspaceId}" and artifactId_g == "{ArtifactId}" and fabricLivyId_g == "{LivyId}"
| order by TimeGenerated desc
| limit 100
To query Apache Spark metrics:
SparkMetrics_CL
| where fabricWorkspaceId_g == "{FabricWorkspaceId}" and artifactId_g == "{ArtifactId}" and fabricLivyId_g == "{LivyId}"
| where name_s endswith "jvm.total.used"
| summarize max(value_d) by bin(TimeGenerated, 30s), executorId_s
| order by TimeGenerated asc
Fabric sends log data to Azure Monitor by using the HTTP Data Collector API. The data posted to the Azure Monitor Data collection API is subject to certain constraints:
Users can query to evaluate metrics and logs at a set frequency, and fire an alert based on the results. For more information, see Create, view, and manage log alerts by using Azure Monitor.
Azure Log Analytics can't currently be selected as a destination for Spark logs and metrics emission in a managed virtual network because the managed private endpoint doesn't support Log Analytics as a data source.
Events
31 Mar, 23 - 2 Apr, 23
The biggest Fabric, Power BI, and SQL learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayTraining
Module
Use Apache Spark in Microsoft Fabric - Training
Apache Spark is a core technology for large-scale data analytics. Microsoft Fabric provides support for Spark clusters, enabling you to analyze and process data at scale.
Certification
Microsoft Certified: Fabric Analytics Engineer Associate - Certifications
As a Fabric analytics engineer associate, you should have subject matter expertise in designing, creating, and deploying enterprise-scale data analytics solutions.