Error connecting databricks to azure synapse

Question

Error connecting databricks to azure synapse

Guilherme Lima 1

I'm new working with cloud services and I'm trying to make a connection between databricks and azure synapse. I have notebooks in databricks that generate data frames and I want to populate a Dedicated SQL pool inside synapse with them.

After looking at what the microsoft documentation recommends do and follow the steps, I came across this error.

code

df.write\  
  .format("com.databricks.spark.sqldw") \  
  .option("url", <the-rest-of-the-connection-string>") \  
  .option("forwardSparkAzureStorageCredentials", "true") \  
  .option("dbTable", "Table") \  
  .option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>") \  
  .load()

error

Py4JJavaError: An error occurred while calling o1509.save.  
: com.databricks.spark.sqldw.SqlDWConnectorException: Exception encountered in Azure Synapse Analytics connector code.  
  
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 14

Some considerations

I created a simple data frame for testing purposes assuming the problem could be the data frame.
An empty table was not previously generated in synapse, I expect it to be created automatically

Could someone please help me understand this problem?

PRADEEPCHEEKATLA 90,641 Reputation points Moderator

2022-07-11T05:38:13.907+00:00
Hello @Guilherme Lima ,

Following up to see if the below suggestion was helpful. And, if you have any further query do let us know.

---------

Please don't forget to click on or upvote button whenever the information provided helps you.

1 answer

Your answer

PRADEEPCHEEKATLA 90,641 Reputation points Moderator

2022-07-11T05:38:13.907+00:00

Hello @Guilherme Lima ,

Following up to see if the below suggestion was helpful. And, if you have any further query do let us know.

---------

Please don't forget to click on or upvote button whenever the information provided helps you.

Answer 1

skumarrana 321

Hi @Guilherme Lima

It seems you want to populate a table in synapse dedicated sql pool with data from a databricks dataframe. I see you are using spark.read for that, instead you should be using the spark "write" method.

Here is a sample code that will allow you to write the data in dataframe "df" into a synapse dedicated sql pool. If the table "table01" doesn't reside in synapse dedicated pool, it will be created for you.

df.write\
.mode('append')\
.format(com.databricks.spark.sqldw)\
.option("url","complete connection string URL")\
.option("forwardSparkAzureStorageCredentials","true")\
.option("dbTable","table01")\
.option("tempDir","staging_folder_path in your storage account")\
.save()

Hope this helps!

Guilherme Lima 1 Reputation point

2022-07-06T13:12:27.267+00:00

Hi @skumarrana , sorry for this error. I'm using the write method, it was an error when creating the question.
Guilherme Lima 1 Reputation point

2022-07-06T13:14:36.903+00:00

I corrected the issue to avoid misunderstandings. I still have the same, but I can't identify why it happens.
skumarrana 321 Reputation points

2022-07-06T17:01:46.857+00:00

Hi @Guilherme Lima

Please share the error for the write method.

Thanks!
Guilherme Lima 1 Reputation point

2022-07-06T17:19:49.863+00:00

Hello @skumarrana , this is the error. During the creation of the question, I ended up copying another part of the code. Instead of the write method, I put the read, but this is the error of the write method.
skumarrana 321 Reputation points

2022-07-06T17:40:43.213+00:00
ok got you, it would be good to have further information about the dataframe to investigate the cause of the error.

If the table already exists in Synapse, please do ensure that the dataframe has same number of columns as the table in synapse and compatible data types.

if the table doesn't already exist in Synapse, then it must be something to do with the dataframe getting created.

I would suggest to start small:

create a simple dataframe with Id and name columns

try writing this dataframe into synapse dedicated pool, the table should be automatically created if it doesn't exist.

in this way, atleast you know that you have all the configuration setup appropriately to insert data into synapse. Then, you can investigate further with your actual dataframe. You may also want to insert first 5 columns of your data frame into Synapse and then gradually increase it to 10 and 15 columns so on, in this way, you know which column is causing the problem.

good luck!

Share via

Error connecting databricks to azure synapse

1 answer

Your answer