Scala Bulkcopy code is not working on Databricks Runtime 7.3 and above

AHMAD Muhammad 21 Reputation points
2022-04-03T23:22:09.593+00:00

I have the following scala code that is running without any errors on Azure Databricks runtime 5.5 - 6.4LTS.

Now that we have upgraded to databricks runtime 7.3 the same code is not working. I have added multiple com.azure.* libraries to test, installed jar files for the relevant libraries to no avail.
can someone please provide the specific libraries i need to install / make changes to my code ?

Code sample:

%scala
import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._
import org.apache.spark.sql.SQLContext
import com.microsoft.azure.sqldb.spark.query._

val query ="""truncate table testable""".stripMargin

val config = Config(Map(
                        "url"             -> HostName,
                        "databaseName"    -> DatabaseName,
                        "user"            -> UsernameName,
                        "password" -> connectPassword,
                        "querycustom" -> query))

sqlContext.sqlDBQuery(config)

the error i am getting is as below

 error: object sqldb is not a member of package com.microsoft.azure
import com.microsoft.azure.sqldb.spark.config.Config

the microsoft azure libraries i am using are:
com.microsoft.azure:spark-mssql-connector_2.12 Release 1.2.0

using the old libraries (scala 2.11) on the newer cluster i get a different error something like classdef not found Product/$class

also if there is no support for this code to run on newer versions of scala, is there an alternate way for bulkcopy so that i can use that.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,080 questions
0 comments No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 85,746 Reputation points Microsoft Employee
    2022-04-04T10:25:58.423+00:00

    Hello @AHMAD Muhammad ,

    Welcome to the MS Q&A platform.

    (UPDATE:5/4/2022): If you are getting the sqldb error means all other support libraries already imported to your notebook and only the latest JAR with dependencies are missing.

    As per the repro, I got the same error message as shown above:

    189925-image.png

    After bit of research, I had found that you will experience this error due to missing JAR with dependencies.

    To resolve this issue, you need to download the JAR file from here: https://search.maven.org/artifact/com.microsoft.azure/azure-sqldb-spark/1.0.2/jar

    189983-image.png

    After downloaded the jar file, upload the JAR library into the cluster and install it.

    Note: After installing both the libraries, make sure to restart the cluster.
    190502-image.png

    Now, you will be able to run the command successfully.

    190447-image.png

    ---------------------------------------

    The GitHub repo for the old connector previously linked to from this page is not actively maintained. Instead, we strongly encourage you to evaluate and use the new connector.

    There are three version sets of the connector available through Maven, a 2.4.x, a 3.0.x and a 3.1.x compatible version. All versions can be found here and can be imported using the coordinates below:

    189741-image.png

    The traditional jdbc connector writes data into your database using row-by-row insertion. You can use the Spark connector to write data to Azure SQL and SQL Server using bulk insert. It significantly improves the write performance when loading large data sets or loading data into tables where a column store index is used.

    import com.microsoft.azure.sqldb.spark.bulkcopy.BulkCopyMetadata  
    import com.microsoft.azure.sqldb.spark.config.Config  
    import com.microsoft.azure.sqldb.spark.connect._  
      
    /**  
      Add column Metadata.  
      If not specified, metadata is automatically added  
      from the destination table, which may suffer performance.  
    */  
    var bulkCopyMetadata = new BulkCopyMetadata  
    bulkCopyMetadata.addColumnMetadata(1, "Title", java.sql.Types.NVARCHAR, 128, 0)  
    bulkCopyMetadata.addColumnMetadata(2, "FirstName", java.sql.Types.NVARCHAR, 50, 0)  
    bulkCopyMetadata.addColumnMetadata(3, "LastName", java.sql.Types.NVARCHAR, 50, 0)  
      
    val bulkCopyConfig = Config(Map(  
      "url"               -> "mysqlserver.database.windows.net",  
      "databaseName"      -> "MyDatabase",  
      "user"              -> "username",  
      "password"          -> "*********",  
      "dbTable"           -> "dbo.Clients",  
      "bulkCopyBatchSize" -> "2500",  
      "bulkCopyTableLock" -> "true",  
      "bulkCopyTimeout"   -> "600"  
    ))  
      
    df.bulkCopyToSqlDB(bulkCopyConfig, bulkCopyMetadata)  
    //df.bulkCopyToSqlDB(bulkCopyConfig) if no metadata is specified.  
    

    For more details, refer to Accelerate real-time big data analytics using the Spark connector.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

    3: /api/attachments/189984-image.png?platform=QnA 5: https://github.com/microsoft/sql-spark-connector


0 additional answers

Sort by: Most helpful