An Apache Spark-based analytics platform optimized for Azure.
Hi Data Engineer,
It sounds like you’re running into a persistent error in your Azure Databricks workspace that’s surfaced as “Reference ID: 5f39c0ad-8c66-41e7-87cc-eec2e5bdb196.” Since a reference ID alone doesn’t reveal the root cause, here’s a general approach you can follow—and a few questions to help us narrow it down:
What you can try right away
- Run the built-in notebook failure troubleshooter
- In your workspace, go to the notebook run or job that failed.
- Click View run details → Troubleshoot and repair. This will guide you step by step based on common failure patterns (like executor loss, GC issues, driver unavailability, etc.).
- Check service diagnostics on your Databricks resource
- Open the Azure portal → your Databricks workspace → Diagnose and solve problems → Service diagnostics.
- Ensure there are no underlying service outages or egress limits being hit for artifacts.
- Review Spark driver and executor logs
- In the cluster UI, select the cluster used for your run → Driver logs and Executor logs.
- Look for error messages (e.g.,
ExecutorLostFailure,FetchFailedException, GC overhead warnings) around the timestamp of your failure.
- Validate cluster/network configuration
- If you see bootstrap or container-download failures, check your VNet’s DNS settings and user-defined routes.
- Ensure the Azure recursive resolver (168.63.129.16) is reachable and that your workspace can connect to Databricks control planes and artifact storage.
Follow-up questions
To dig deeper, could you share:
- The exact error message or stack trace that appears in the driver/ executor logs (not just the reference ID)?
- Which operation you were running when the error occurred (e.g., notebook cell, job step, file write, Spark magic command, etc.)?
- Your cluster configuration details:
- Databricks runtime version
- Number of workers and driver size
- Any custom Spark configurations you’ve applied
- Whether this error affects all notebooks/jobs or a single workspace region/time window
- Any recent changes to your environment (new libraries, network policies, mount configurations)
Once we have those details, we can point you to a more specific fix.
Reference list
- Diagnose and resolve notebook execution issues https://learn.microsoft.com/azure/databricks/kb/troubleshooting-notebook-execution
- Diagnosing and Fixing Azure Blob Storage Write Issues in Databricks https://supportabilityhub.microsoft.com/solutions/apollosolutions/8adea87d-bfca-fe76-2273-a9f55ad61e05/apollo-635b660f-911e-4568-aab6-82ca7126648b
- Issues with Spark magic in interactive notebook sessions https://supportabilityhub.microsoft.com/solutions/apollosolutions/273c152b-5d6d-78ca-bac5-33c7069f4854/347b93c4-a3f5-4824-98b7-79eee5b6009c
- Troubleshoot File Permission Denied Errors in Shared Clusters https://supportabilityhub.microsoft.com/solutions/apollosolutions/0b7b6af0-40d2-40c5-bde6-5efa04a3b11f/apollo-e7256f8a-68de-40f7-8a3b-85983870012e
- Diagnose and resolve bootstrap errors when starting clusters https://supportabilityhub.microsoft.com/solutions/apollosolutions/8adea87d-bfca-fe76-2273-a9f55ad61e05/0e843034-8d38-4001-a6c0-bfb77c00459a
- Error conditions in Azure Databricks https://learn.microsoft.com/azure/databricks/error-messages/error-classes
Note: This content was drafted with the help of an AI system. Please verify the information before relying on it for decision-making.