No logs being output when running spark job as part of synapse pipeline

alexszym 1 Reputation point
2022-04-06T14:53:26.99+00:00

I'm using spark jobs with Azure Synapse.

I've been able to successfully create a spark job that logs to Azure Logs Analytics workspace. The Azure Logs Analytics workspace is configured via Apache Spark Configuration as per the documentation.

spark.synapse.logAnalytics.enabled true  
spark.synapse.logAnalytics.workspaceId <workspace-id>  
spark.synapse.logAnalytics.keyVault.name <keyvault-name>  
spark.synapse.logAnalytics.keyVault.key.secret <keyvault-secret-name>  

If I submit the spark job directly from spark job definitions, the logging works. If I however use the same spark job definition as part of an Azure Synapse pipeline the logging doesn't work.

Edit: Looks like it might be caused by inability to access keyvault. The following exception is visible in the logs, when running the job as part of the pipeline:

java.lang.Exception: Access token couldn't be obtained {"result":"DependencyError","errorId":"BadRequest","errorMessage":"LSRServiceException is [{\"StatusCode\":400,\"ErrorResponse\":{\"code\":\"CannotAcquireMSIForVault\",\"message\":\"Cannot acquire MSI token for a Vault audience.\",\"target\":\"Vault\"},\"StackTrace\":\"   at Microsoft.Marlin.Common.ADF.Impl.LSRClient.CheckForFailures(HttpResponseMessage response, String responseContent) in C:\\\\source\\\\Common\\\\Microsoft.Marlin.Common.ADF\\\\Impl\\\\LSRClient.cs:line 348\\r\\n   at Microsoft.Marlin.Common.ADF.Impl.LSRClient.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken, String traceId) in C:\\\\source\\\\Common\\\\Microsoft.Marlin.Common.ADF\\\\Impl\\\\LSRClient.cs:line 365\\r\\n   at Microsoft.Marlin.Common.ADF.Impl.LSRClient.ResolveAudienceAsync(String audience, ResolveAudienceRequest request, String traceId, CancellationToken cancellationToken) in C:\\\\source\\\\Common\\\\Microsoft.Marlin.Common.ADF\\\\Impl\\\\LSRClient.cs:line 181\\r\\n   at Microsoft.Marlin.TokenService.Token.LSRAudienceTokenProvider.GetToken(Boolean isLinkedService, String audience, String sessionToken, CancellationToken cancellationToken) in C:\\\\source\\\\TokenService\\\\Microsoft.Marlin.TokenService\\\\Token\\\\LSRAudienceTokenProvider.cs:line 153\\r\\n   at Microsoft.Marlin.TokenService.Token.LSRAudienceTokenProvider.GetTokenForAudienceAsync(Boolean isLinkedService, String audience, String account, String sessionToken, SignaturePayload signaturePayload, CancellationToken cancellationToken) in C:\\\\source\\\\TokenService\\\\Microsoft.Marlin.TokenService\\\\Token\\\\LSRAudienceTokenProvider.cs:line 127\\r\\n   at Microsoft.Marlin.TokenService.Controllers.TokenController.GetTokenAsync(TokenRequest request, CancellationToken cancellationToken) in C:\\\\source\\\\TokenService\\\\Microsoft.Marlin.TokenService\\\\Controllers\\\\TokenController.cs:line 82\\r\\n   at lambda_method1281(Closure , Object )\\r\\n   at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.AwaitableObjectResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)\\r\\n   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeActionMethodAsync>g__Awaited|12_0(ControllerActionInvoker invoker, ValueTask`1 actionResultValueTask)\\r\\n   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeNextActionFilterAsync>g__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\\r\\n   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context)\\r\\n   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)\\r\\n   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeInnerFilterAsync>g__Awaited|13_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\\r\\n   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeNextResourceFilter>g__Awaited|24_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\\r\\n   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Rethrow(ResourceExecutedContextSealed context)\\r\\n   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)\\r\\n   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeFilterPipelineAsync>g__Awaited|19_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\\r\\n   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeAsync>g__Awaited|17_0(ResourceInvoker invoker, Task task, IDisposable scope)\\r\\n   at Microsoft.AspNetCore.Routing.EndpointMiddleware.<Invoke>g__AwaitRequestTask|6_0(Endpoint endpoint, Task requestTask, ILogger logger)\\r\\n   at Microsoft.AspNetCore.Authorization.AuthorizationMiddleware.Invoke(HttpContext context)\\r\\n   at Swashbuckle.AspNetCore.SwaggerUI.SwaggerUIMiddleware.Invoke(HttpContext httpContext)\\r\\n   at Swashbuckle.AspNetCore.Swagger.SwaggerMiddleware.Invoke(HttpContext httpContext, ISwaggerProvider swaggerProvider)\\r\\n   at Microsoft.AspNetCore.Builder.Extensions.MapWhenMiddleware.Invoke(HttpContext context)\\r\\n   at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware.<Invoke>g__Awaited|6_0(ExceptionHandlerMiddleware middleware, HttpContext context, Task task)\",\"Message\":\"Cannot acquire MSI token for a Vault audience.\",\"Data\":{},\"InnerException\":null,\"HelpLink\":null,\"Source\":\"Microsoft.Marlin.Common.ADF\",\"HResult\":-2146233088}]. TraceId : 8c8bfae6-a00f-4bc3-981a-7e4d064825a4. Error Component : LSR"}  

I'm using RBAC to configure access to keyvault and the workspace has the relevant permission:
190570-keyvault.png

The keyvault is also configured as a Linked Service using 'Managed Identity' authentication method in the Synapse workspace.

Is this not the correct configuration?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,369 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Saurabh Sharma 23,676 Reputation points Microsoft Employee
    2022-04-14T00:16:21.067+00:00

    Hi @alexszym ,

    I think you are missing the linked service name in the Apache Spark Configuration file.
    Please add the spark.synapse.logAnalytics.keyVault.linkedServiceName {Key Vault Linked Service Name} to your spark configuration file.

    You could find details provided under the section - Option 3. Configure with a linked service.

    Also, if your synapse workspace is enabled with a Git, then you need to publish your linked service before you running your Synapse pipeline.
    I have tested this option, and it now works as expected and I could see Spark logs getting ingested into Log Analytics workspace-
    192837-image.png

    Please let me know if you have any questions.

    Thanks
    Saurabh