"Internal system error" when scanning Databricks in Purview

Brendan Yee 40 Reputation points


I've set up a scan for Databricks (workspace scoped Hive metastore) in Purview following the steps in this documentation. The scan fails when I run it, though, with the following error message:

Internal system error. Please contact support with correlationId:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx System Error, contact support: (119) System error while attempting to launch datascan process. ActivityId: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx )

I'm not sure what is causing this error, and what needs to be done to resolve it. Could you provide some guidance, please?

Additional notes:

  • I've verified that all the prerequisites were met
  • I've deleted and re-created the scan, but still receive the same error message
  • I ran the scan yesterday with the same configurations, but received a different error message (missing JDK 11, which I have since installed on the VM hosting the SHIR)
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,844 questions
Microsoft Purview
Microsoft Purview
A Microsoft data governance service that helps manage and govern on-premises, multicloud, and software-as-a-service data. Previously known as Azure Purview.
853 questions
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 73,496 Reputation points Microsoft Employee

    @Brendan Yee - Thanks for the question and using MS Q&A platform.

    The error message you are seeing indicates that there was an issue launching the datascan process during the scan. This could be caused by a number of factors, such as network connectivity issues, misconfigured credentials, or insufficient resources on the VM hosting the SHIR.

    Here are some steps you can take to troubleshoot the issue:

    • Check the network connectivity between the SHIR and the Databricks workspace. Make sure that the SHIR can connect to the workspace and that there are no firewall rules blocking the connection.
    • Verify that the credentials used to connect to the Databricks workspace are correct and have sufficient permissions to access the metadata.
    • Check the logs for the datascan process to see if there are any error messages that can provide more information about the issue. You can find the logs in the SHIR logs directory.
    • Make sure that the VM hosting the SHIR has sufficient resources (CPU, memory, disk space) to run the datascan process.
    • If the issue persists, you can try deleting and recreating the SHIR to see if that resolves the issue.

    If none of these steps resolve the issue, you may open a support ticket for further assistance.