Azure Databricks - Upgrade runtime from 6.4 to 9.0 to 10.0

Question

Hello There,

I am trying to upgrade my cluster from runtime 6.4 to 9.0 or 10.0. I am able to do that successfully but I have a notebook that is supposed read data from Cosmos DB (Cassandra) to load into databricks. The jobs runs fine with 6.4 but fails in 9.0 or 10.0.

Considering that fact that upgrading requires, new driver and connector version. I tried to install the libraries from maven and also tried to download jar files from third party sites but I believe I couldn't get to the right connector / driver. Require guidance in getting the right libraries for 9.0 or 10.0 runtime.

Below is the code that I am using:

// To import all the cassandra tables
import org.apache.spark.sql.cassandra._
//Spark connector
import com.datastax.spark.connector._
//import com.datastax.spark.connector._
import com.datastax.spark.connector.cql.CassandraConnector

//import com.datastax.oss.driver.api.core._

//CosmosDB library for multiple retry
import com.microsoft.azure.cosmosdb.cassandra

import org.apache.spark.sql.functions._
import spark.sqlContext.implicits._
import org.apache.spark.storage.StorageLevel

spark.conf.set("spark.cassandra.connection.host","")

spark.conf.set("spark.cassandra.connection.port","")
spark.conf.set("spark.cassandra.connection.ssl.enabled","true")
spark.conf.set("spark.cassandra.auth.username","")

spark.conf.set("spark.cassandra.auth.password","")
spark.conf.set("spark.cassandra.connection.factory", "com.microsoft.azure.cosmosdb.cassandra.CosmosDbConnectionFactory")

spark.conf.set("spark.cassandra.output.batch.size.rows", "1")
spark.conf.set("spark.cassandra.connection.connections_per_executor_max", "10")
spark.conf.set("spark.cassandra.output.concurrent.writes", "1000")
spark.conf.set("spark.cassandra.concurrent.reads", "512")
spark.conf.set("spark.cassandra.output.batch.grouping.buffer.size", "1000")
spark.conf.set("spark.cassandra.connection.keep_alive_ms", "600000000")

spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")

val paymentdetailsDF = sqlContext
.read
.format("org.apache.spark.sql.cassandra")
.options(Map( "table" -> "test_table", "keyspace" -> "test_keyspace"))
.load

Answer

Hello @Amrit Raj Kalashakum ,

Welcome to the Microsoft Q&A platform.

To resolve this issue, make sure you have installed the required dependencies (com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.1.0) on the databricks cluster.

Please do check out the latest dependencies based on the databricks runtime.

For more details, refer to Access Azure Cosmos DB Cassandra API data from Azure Databricks

In case if you still facing the same issue, could you please share the complete stack trace of the error message which you are experiencing?

Hope this will help. Please let us know if any further queries.

------------------------------

Please don't forget to click on or upvote button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
Want a reminder to come back and check responses? Here is how to subscribe to a notification
If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Azure Databricks - Upgrade runtime from 6.4 to 9.0 to 10.0

1 answer