Azure Databricks Synapse Connectivity

Question

Azure Databricks Synapse Connectivity

Sahar Mostafa 26 Microsoft Employee

We are trying to use PolyBase in Azure Data Factory to copy the Delta lake table to Synapse. Using a simple Copy Activity in Azure Data Factory, our linked Services connections from Delta lake and Synapse show connection is successful, yet the copy step is failing: Error code 2200 Troubleshooting guide Failure type User configuration issue Details ErrorCode=AzureDatabricksCommandError,Hit an error when running the command in Azure Databricks. Error details: Failure to initialize configuration.

KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2021-03-30T18:15:16.617+00:00

Hi @Sahar Mostafa '

Welcome to Microsoft Q&A forum and thanks for your reaching out.

From the error message is seems like a cluster configuration issue. Could you please verify and correct the cluster config as suggested in this doc: Specify the cluster configuration

Hope this helps. Do let us know how it goes.

Thank you
Sahar Mostafa 26 Reputation points Microsoft Employee

2021-03-30T20:31:55.067+00:00

Thanks @KranthiPakala-MSFT
We tried the configuration options listed to turn on the auto optimize, but still facing same error.

Operation on target Deltalake to Synapse failed: ErrorCode=AzureDatabricksCommandError,Hit an error when running the command in Azure Databricks. Error details: Failure to initialize configuration.
KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2021-03-31T04:47:44.35+00:00

Hi @Sahar Mostafa , thanks for your response and confirmation. We will further investigate into this and will get back to you as soon as we have our findings ready, in the meantime could you please share the failed pipeline runId and failed activity run ID?

Thanks
Fangzhou Zhang 231 Reputation points

2021-03-31T14:41:36.653+00:00

Hi Kranthi,

Our failed pipeline runId is 8c4e3800-a824-4afa-8367-2bc199a49e44
KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2021-03-31T22:56:37.913+00:00

Thanks for the details @Fangzhou Zhang , @Sahar Mostafa . Could you please confirm if your copy activity accessing delta table using datalake gen2 or blob storage with databricks? If it is ADLS gen2 what authentication is being used?

Accepted answer

3 additional answers

Your answer

KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2021-03-30T18:15:16.617+00:00

Hi @Sahar Mostafa '

Welcome to Microsoft Q&A forum and thanks for your reaching out.

From the error message is seems like a cluster configuration issue. Could you please verify and correct the cluster config as suggested in this doc: Specify the cluster configuration

Hope this helps. Do let us know how it goes.

Thank you
Sahar Mostafa 26 Reputation points Microsoft Employee

2021-03-30T20:31:55.067+00:00

Thanks @KranthiPakala-MSFT
We tried the configuration options listed to turn on the auto optimize, but still facing same error.

Operation on target Deltalake to Synapse failed: ErrorCode=AzureDatabricksCommandError,Hit an error when running the command in Azure Databricks. Error details: Failure to initialize configuration.
KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2021-03-31T04:47:44.35+00:00

Hi @Sahar Mostafa , thanks for your response and confirmation. We will further investigate into this and will get back to you as soon as we have our findings ready, in the meantime could you please share the failed pipeline runId and failed activity run ID?

Thanks
Fangzhou Zhang 231 Reputation points

2021-03-31T14:41:36.653+00:00

Hi Kranthi,

Our failed pipeline runId is 8c4e3800-a824-4afa-8367-2bc199a49e44
KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2021-03-31T22:56:37.913+00:00

Thanks for the details @Fangzhou Zhang , @Sahar Mostafa . Could you please confirm if your copy activity accessing delta table using datalake gen2 or blob storage with databricks? If it is ADLS gen2 what authentication is being used?

Answer 1

KranthiPakala-MSFT 46,642 Microsoft Employee Moderator

Hi there,

As per the offline discussion, adding the the storage access key on the Databricks cluster as part of Apache Spark configuration resolved the issue. This has been called out in the Prerequisite section of this doc: Copy data to and from Azure Databricks Delta Lake by using Azure Data Factory

Hope this helps anyone who experience similar issue.

----------

Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.

Sahar Mostafa 26 Reputation points Microsoft Employee

2021-04-01T16:28:18.737+00:00

Thanks @KranthiPakala-MSFT that solved the connectivity issue on our side.

The error description we received was not clear and should indicate the missing key on databricks with the prerequisite link for more troubleshooting.
KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2021-04-01T16:35:30.093+00:00

@Sahar Mostafa -Glad to know that the issue was resolved.

I agree that there is a scope to improve the error message to help self troubleshoot. I will share your feedback to Azure Databricks product team to improve this behavior.

Appreciate much for your feedback :)

Answer 2

fs.azure.account.oauth2.client.id <<client id>>
fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<<tenent id>>/oauth2/token
spark.databricks.delta.preview.enabled true
fs.azure.account.key.<<account name>>.blob.core.windows.net <<storage key>>
fs.azure.account.auth.type OAuth
fs.azure.account.oauth2.client.secret <<client secret>>

This worked for me, passthrough has many limitations, way to go is service principal or managed identity (in preview to date). Your SPN should have appropriate storage permissions (contributor) to read/write.

Answer 3

I am getting the same error and i can preview the data but when i am running the pipeline it will throw this error.

Failure type
User configuration issue
Details
ErrorCode=AzureDatabricksCommandError,Hit an error when running the command in Azure Databricks. Error details: Failure to initialize configuration.

I have set up the following the configurations.

spark.databricks.delta.autoCompact.enabled true
spark.databricks.delta.optimizeWrite.enabled true

In addition i have also added my ADLS Gen 2 key in spark configurations.

Do we know what else need as pre requisite ?

Answer 4

1+ to IrfanMaroof-7597,
I am getting this error:
Error code
2200
Troubleshooting guide
Failure type
User configuration issue
Details
ErrorCode=AzureDatabricksCommandError,Hit an error when running the command in Azure Databricks. Error details: com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token.

run id: 7cbe179d-39d7-450f-9a2d-b0485a9e441e

spark conf:
spark.hadoop.fs.azure.account.key.<data lake account name>.dfs.core.windows.net <data lake account key>
spark.databricks.delta.optimizeWrite.enabled true
spark.databricks.delta.autoCompact.enabled true
spark.databricks.delta.preview.enabled true
spark.databricks.passthrough.enabled true

What else need to do ??

Share via

Azure Databricks Synapse Connectivity

3 additional answers

Your answer