Error connecting databricks to azure synapse

Guilherme Lima 1 Reputation point
2022-07-05T18:36:20.817+00:00

I'm new working with cloud services and I'm trying to make a connection between databricks and azure synapse. I have notebooks in databricks that generate data frames and I want to populate a Dedicated SQL pool inside synapse with them.

After looking at what the microsoft documentation recommends do and follow the steps, I came across this error.

code

df.write\  
  .format("com.databricks.spark.sqldw") \  
  .option("url", <the-rest-of-the-connection-string>") \  
  .option("forwardSparkAzureStorageCredentials", "true") \  
  .option("dbTable", "Table") \  
  .option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>") \  
  .load()  

error

Py4JJavaError: An error occurred while calling o1509.save.  
: com.databricks.spark.sqldw.SqlDWConnectorException: Exception encountered in Azure Synapse Analytics connector code.  
  
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 14  

Some considerations

  • I created a simple data frame for testing purposes assuming the problem could be the data frame.
  • An empty table was not previously generated in synapse, I expect it to be created automatically

Could someone please help me understand this problem?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,181 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,329 questions
{count} votes

1 answer

Sort by: Most helpful
  1. skumarrana 321 Reputation points
    2022-07-05T21:09:56.577+00:00

    Hi @Guilherme Lima

    It seems you want to populate a table in synapse dedicated sql pool with data from a databricks dataframe. I see you are using spark.read for that, instead you should be using the spark "write" method.

    Here is a sample code that will allow you to write the data in dataframe "df" into a synapse dedicated sql pool. If the table "table01" doesn't reside in synapse dedicated pool, it will be created for you.

    df.write\
    .mode('append')\
    .format(com.databricks.spark.sqldw)\
    .option("url","complete connection string URL")\
    .option("forwardSparkAzureStorageCredentials","true")\
    .option("dbTable","table01")\
    .option("tempDir","staging_folder_path in your storage account")\
    .save()

    Hope this helps!


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.