Azure Databricks - Upgrade runtime from 6.4 to 9.0 to 10.0

Amrit Raj Kalashakum 1 Reputation point
2021-11-02T12:20:28.003+00:00

Hello There,

I am trying to upgrade my cluster from runtime 6.4 to 9.0 or 10.0. I am able to do that successfully but I have a notebook that is supposed read data from Cosmos DB (Cassandra) to load into databricks. The jobs runs fine with 6.4 but fails in 9.0 or 10.0.

Considering that fact that upgrading requires, new driver and connector version. I tried to install the libraries from maven and also tried to download jar files from third party sites but I believe I couldn't get to the right connector / driver. Require guidance in getting the right libraries for 9.0 or 10.0 runtime.

Below is the code that I am using:

// To import all the cassandra tables
import org.apache.spark.sql.cassandra._
//Spark connector
import com.datastax.spark.connector._
//import com.datastax.spark.connector._
import com.datastax.spark.connector.cql.CassandraConnector

//import com.datastax.oss.driver.api.core._

//CosmosDB library for multiple retry
import com.microsoft.azure.cosmosdb.cassandra

import org.apache.spark.sql.functions._
import spark.sqlContext.implicits._
import org.apache.spark.storage.StorageLevel

spark.conf.set("spark.cassandra.connection.host","<test>")

spark.conf.set("spark.cassandra.connection.port","<test>")
spark.conf.set("spark.cassandra.connection.ssl.enabled","true")
spark.conf.set("spark.cassandra.auth.username","<test>")

spark.conf.set("spark.cassandra.auth.password","<test>")
spark.conf.set("spark.cassandra.connection.factory", "com.microsoft.azure.cosmosdb.cassandra.CosmosDbConnectionFactory")

spark.conf.set("spark.cassandra.output.batch.size.rows", "1")
spark.conf.set("spark.cassandra.connection.connections_per_executor_max", "10")
spark.conf.set("spark.cassandra.output.concurrent.writes", "1000")
spark.conf.set("spark.cassandra.concurrent.reads", "512")
spark.conf.set("spark.cassandra.output.batch.grouping.buffer.size", "1000")
spark.conf.set("spark.cassandra.connection.keep_alive_ms", "600000000")

spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")

val paymentdetailsDF = sqlContext
.read
.format("org.apache.spark.sql.cassandra")
.options(Map( "table" -> "test_table", "keyspace" -> "test_keyspace"))
.load

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,904 questions
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,435 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 76,511 Reputation points Microsoft Employee
    2021-11-03T07:59:42.187+00:00

    Hello @Amrit Raj Kalashakum ,

    Welcome to the Microsoft Q&A platform.

    To resolve this issue, make sure you have installed the required dependencies (com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.1.0) on the databricks cluster.

    146056-adb-apachecassandra.gif

    Please do check out the latest dependencies based on the databricks runtime.

    146093-image.png

    For more details, refer to Access Azure Cosmos DB Cassandra API data from Azure Databricks

    In case if you still facing the same issue, could you please share the complete stack trace of the error message which you are experiencing?

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators