unable to load the dataframe from cosmos

Shubham Mehta 51 Reputation points Microsoft Employee
2021-12-14T22:43:20.617+00:00

Hello,

I am trying to load the dataframe from cosmos and unable to do that. I am trying the following command:

   def load_dataframe(self, query):      
       return self.spark_session.read.format('com.microsoft.azure.cosmosdb.spark').options(**self._make_connection(query)).load()  

And I am seeing the below error:
157681-image.png

I have tried the following steps as well:

   You need to build the repository into the jar file first using SBT. Then include it to your spark cluster.  
     
   I know there will be a lot of people having trouble with buiding this jar file (include myself of several hours ago), so I will guide you how to build the jar file, step by step:  
     
   Go to https://www.scala-sbt.org/download.html to download SBT, then install it.  
     
   Go to https://github.com/Azure/azure-cosmosdb-spark and download the zip file.  
     
   Open the folder of the repository you have just downloaded, right click in the blank space and click "Open PowerShell windows here" . https://i.stack.imgur.com/Fq7NX.png  
     
   In the Shell windows, type "sbt" then press enter. It may require you to download the Java Development Kit. If so, go to https://www.oracle.com/java/technologies/javase-downloads.html to download and install it. You may need to close and reopen the shell windows after installing.  
     
   If things go right, you may see this screen: https://i.stack.imgur.com/fMxVr.png  
     
   After the above step has done its job, type "package". The shell may show you something like this, and it may take you a long time to finish the job. https://i.stack.imgur.com/hr2hw.png  
     
   After the build is done, go to the "target" folder, then "scala-2.12" folder to get the jar file. https://i.stack.imgur.com/Aziqy.png  
     
   After you got the jar file, include it to the Spark cluster.  

157692-image.png

Can anybody help me with this problem? Or what can I do to fix this.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,696 questions
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,543 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 85,746 Reputation points Microsoft Employee
    2021-12-15T07:37:10.977+00:00

    Hello @Shubham Mehta ,

    Thanks for the question and using MS Q&A platform.

    I would suggest you to use the below connector to read the Cosmos data into Dataframe:

    https://search.maven.org/artifact/com.azure.cosmos.spark/azure-cosmos-spark_3-1_2-12/4.1.0/jar

    For more details, refer to Azure Cosmos DB Apache Spark 3 OLTP Connector for Core (SQL) API: Release notes and resources.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators