Събитие
31.03, 23 ч. - 2.04, 23 ч.
Най-голямото събитие за обучение на Fabric, Power BI и SQL. 31 март – 2 април. Използвайте код FABINSIDER, за да спестите $400.
Регистрирайте се днесТози браузър вече не се поддържа.
Надстройте до Microsoft Edge, за да се възползвате от най-новите функции, актуализации на защитата и техническа поддръжка.
The Fabric Apache Spark diagnostic emitter extension is a library that enables Apache Spark applications to emit logs, event logs, and metrics to various destinations, including Azure Log Analytics, Azure Storage, and Azure Event Hubs.
In this tutorial, you learn how to use the Fabric Apache Spark diagnostic emitter extension to send Apache Spark application logs, event logs, and metrics to your Azure Event Hubs.
To collect diagnostic logs and metrics, you can use an existing Azure Event Hubs instance. If you don't have one, you can create an event hub.
Create a Fabric Environment Artifact in Fabric
Add the following Spark properties with the appropriate values to the environment artifact, or select Add from .yml in the ribbon to download the sample yaml file which already containing the following properties.
spark.synapse.diagnostic.emitters: MyEventHub
spark.synapse.diagnostic.emitter.MyEventHub.type: "AzureEventHub"
spark.synapse.diagnostic.emitter.MyEventHub.categories: "Log,EventLog,Metrics"
spark.synapse.diagnostic.emitter.MyEventHub.secret: <connection-string>
spark.fabric.pools.skipStarterPools: "true" //Add this Spark property when using the default pool.
Fill in the <connection-string>
parameters in the configuration file. For more information, see Azure Event Hubs configurations.
Бележка
Known issue: Unable to start a session using Option 2 provisionally. Currently, storing secrets in Key Vault prevents Spark sessions from starting. Please prioritize configuring it using the method outlined in Option 1.
Ensure that users who submit Apache Spark applications are granted read secret permissions. For more information, see Provide access to Key Vault keys, certificates, and secrets with an Azure role-based access control.
To configure Azure Key Vault for storing the workspace key:
Create and go to your key vault in the Azure portal.
On the settings page for the key vault, select Secrets, then Generate/Import.
On the Create a secret screen, choose the following values:
<connection-string>
for the secret.Create a Fabric Environment Artifact in Fabric.
Add the following Spark properties. Or select Add from .yml on the ribbon to download the sample yaml file, which includes following Spark properties.
spark.synapse.diagnostic.emitters: MyEventHub
spark.synapse.diagnostic.emitter.MyEventHub.type: "AzureEventHub"
spark.synapse.diagnostic.emitter.MyEventHub.categories: "Log,EventLog,Metrics"
spark.synapse.diagnostic.emitter.MyEventHub.secret.keyVault: <AZURE_KEY_VAULT_NAME>
spark.synapse.diagnostic.emitter.MyEventHub.secret.keyVault.secretName: <AZURE_KEY_VAULT_SECRET_KEY_NAME>
spark.fabric.pools.skipStarterPools: "true" //Add this Spark property when using the default pool.
Fill in the following parameters in the configuration file: <AZURE_KEY_VAULT_NAME>
, <AZURE_KEY_VAULT_SECRET_KEY_NAME>
. For more details on these parameters, refer to Azure Event Hubs configurations.
Save and publish changes.
To attach the environment to Notebooks or Spark job definitions:
To set the environment as the workspace default:
Бележка
Only workspace admins can manage workspace configurations. Changes made here will apply to all notebooks and Spark job definitions attached to the workspace settings. For more information, see Fabric Workspace Settings.
Configuration | Description |
---|---|
spark.synapse.diagnostic.emitters |
Required. The comma-separated destination names of diagnostic emitters. |
spark.synapse.diagnostic.emitter.<destination>.type |
Required. Built-in destination type. To enable Azure Event Hubs destination, the value should be AzureEventHub . |
spark.synapse.diagnostic.emitter.<destination>.categories |
Optional. The comma-separated selected log categories. Available values include DriverLog , ExecutorLog , EventLog , Metrics . If not set, the default value is all categories. |
spark.synapse.diagnostic.emitter.<destination>.secret |
Optional. The Azure Event Hubs instance connection string. This field should match this pattern Endpoint=sb://<FQDN>/;SharedAccessKeyName=<KeyName>;SharedAccessKey=<KeyValue>;EntityPath=<PathName> |
spark.synapse.diagnostic.emitter.<destination>.secret.keyVault |
Required if .secret isn't specified. The Azure Key vault name where the secret (connection string) is stored. |
spark.synapse.diagnostic.emitter.<destination>.secret.keyVault.secretName |
Required if .secret.keyVault is specified. The Azure Key vault secret name where the secret (connection string) is stored. |
spark.synapse.diagnostic.emitter.<destination>.filter.eventName.match |
Optional. The comma-separated spark event names, you can specify which events to collect. For example: SparkListenerApplicationStart,SparkListenerApplicationEnd |
spark.synapse.diagnostic.emitter.<destination>.filter.loggerName.match |
Optional. The comma-separated Log4j logger names, you can specify which logs to collect. For example: org.apache.spark.SparkContext,org.example.Logger |
spark.synapse.diagnostic.emitter.<destination>.filter.metricName.match |
Optional. The comma-separated spark metric name suffixes, you can specify which metrics to collect. For example: jvm.heap.used |
Бележка
The Azure Eventhub instance connection string should always contains the EntityPath
, which is the name of the Azure Event Hubs instance.
Here's a sample log record in JSON format:
{
"timestamp": "2024-09-06T03:09:37.235Z",
"category": "Log|EventLog|Metrics",
"fabricLivyId": "<fabric-livy-id>",
"applicationId": "<application-id>",
"applicationName": "<application-name>",
"executorId": "<driver-or-executor-id>",
"fabricTenantId": "<my-fabric-tenant-id>",
"capacityId": "<my-fabric-capacity-id>",
"artifactType": "SynapseNotebook|SparkJobDefinition",
"artifactId": "<my-fabric-artifact-id>",
"fabricWorkspaceId": "<my-fabric-workspace-id>",
"fabricEnvId": "<my-fabric-environment-id>",
"executorMin": "<executor-min>",
"executorMax": "<executor-max>",
"isHighConcurrencyEnabled": "true|false",
"properties": {
// The message properties of logs, events and metrics.
"timestamp": "2024-09-06T03:09:37.235Z",
"message": "Initialized BlockManager: BlockManagerId(1, vm-04b22223, 34319, None)",
"logger_name": "org.apache.spark.storage.BlockManager",
"level": "INFO",
"thread_name": "dispatcher-Executor"
//...
}
}
Create a managed private endpoint for the target Azure Event Hubs. For detailed instructions, refer to Create and use managed private endpoints in Microsoft Fabric - Microsoft Fabric.
Once the managed private endpoint is approved, users can begin emitting logs and metrics to the target Azure Event Hubs.
Събитие
31.03, 23 ч. - 2.04, 23 ч.
Най-голямото събитие за обучение на Fabric, Power BI и SQL. 31 март – 2 април. Използвайте код FABINSIDER, за да спестите $400.
Регистрирайте се днесОбучение
Пътека за обучение
Ingest data with Microsoft Fabric - Training
Explore how Microsoft Fabric enables you to ingest and orchestrate data from various sources (such as files, databases, or web services) through dataflows, notebooks, and pipelines.
Сертифициране
Microsoft Certified: Fabric Data Engineer Associate - Certifications
As a Fabric Data Engineer, you should have subject matter expertise with data loading patterns, data architectures, and orchestration processes.
Документация
This article shows how to use the Fabric Spark diagnostic emitter extension to collect logs, event logs and metrics.cluster and learn how to integrate the Grafana dashboards.
Monitor Apache Spark applications with Azure Log Analytics - Microsoft Fabric
Learn how to enable the Fabric connector for collecting and sending the Apache Spark application metrics and logs to your Log Analytics workspace.
Apache Spark application detail monitoring - Microsoft Fabric
Learn how to monitor your Apache Spark application details, including recent run status, issues, and the progress of your jobs.