How to leverage existing spark cluster in Synapse Workspace

Question

We have some legacy computing resources in Cosmos which is Spark on Cosmos. I'd like to know if we could connect the existing computing resources on cosmos.

Accepted Answer

It depends on what your goals are. If you would like to create notebooks in your Synapse workspace and have them run on your HDInsight or Databricks clusters, then the answer is no. You would need to migrate your jobs to a cluster maintained within Synapse. You can connect to CosmosDB from within Synapse just as you would in Databricks or HDInsight.

If your goal is to run jobs on Databricks and then use the results within Synapse, or use data in Synapse within Databricks, then yes, this is possible.

To move data in and out of Synapse from Databricks you will need a blob storage account that both Databricks and Synapse have permissions to read and write- this is used as a temporary common storage area for the two services.

In python you would use something similar to the following code in Databricks to move the data between the two services:

spark.conf.set(  
  "fs.azure.account.key..blob.core.windows.net",  
  "")  
  
# Get some data from an Azure Synapse table.  
df = spark.read \  
  .format("com.databricks.spark.sqldw") \  
  .option("url", "jdbc:sqlserver://") \  
  .option("tempDir", "wasbs://@.blob.core.windows.net/") \  
  .option("forwardSparkAzureStorageCredentials", "true") \  
  .option("dbTable", "") \  
  .load()  
  
# Load data from an Azure Synapse query.  
df = spark.read \  
  .format("com.databricks.spark.sqldw") \  
  .option("url", "jdbc:sqlserver://") \  
  .option("tempDir", "wasbs://@.blob.core.windows.net/") \  
  .option("forwardSparkAzureStorageCredentials", "true") \  
  .option("query", "select x, count(*) as cnt from table group by x") \  
  .load()  
  
# Apply some transformations to the data, then use the  
# Data Source API to write the data back to another table in Azure Synapse.  
  
df.write \  
  .format("com.databricks.spark.sqldw") \  
  .option("url", "jdbc:sqlserver://") \  
  .option("forwardSparkAzureStorageCredentials", "true") \  
  .option("dbTable", "") \  
  .option("tempDir", "wasbs://@.blob.core.windows.net/") \  
  .save()

Does that answer your question?

Share via

How to leverage existing spark cluster in Synapse Workspace

0 additional answers

Your answer