Error when using the CosmosDB Analytical Store for source queries in a Synapse ADF Dataflow

Jude Moore 5 Reputation points
2024-09-06T10:59:31.9366667+00:00

How can I resolve the error message I receive when using the CosmosDB Analytical Store as source queries in a Synapse ADF Dataflow?

I created the source with Analytical Store selected as explained in this how-to link: https://techcommunity.microsoft.com/t5/azure-data-factory-blog/capture-changed-data-from-your-cosmos-db-analytical-store/ba-p/3783530.

The error occurs when I go to the Data Preview tab in Azure Data Factory's Dataflow editor. If I use an alternative CosmosDB instance, there is no error and the the Data Preview returns sample data. . If I switch the Dataflow source to "Transactional" instead of "Analytical," the Data Preview also returns sample data.

I also tried using new Integration runtimes in an effort to avoid the error too.

The issue persists in both Debug mode and when using a pipeline to run the Dataflow. The error message is:

shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.snapshot.BlobStorageContainer

 

Verbose response:

{"message":"Job failed due to reason: shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.snapshot.BlobStorageContainer. Details:java.lang.ClassNotFoundException: shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.snapshot.BlobStorageContainer

                java.net.URLClassLoader.findClass(URLClassLoader.java:387)

                java.lang.ClassLoader.loadClass(ClassLoader.java:418)

                sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)

                java.lang.ClassLoader.loadClass(ClassLoader.java:351)

                java.lang.Class.forName0(Native Method)

                java.lang.Class.forName(Class.java:348)

                java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:758)

                java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1988)

                java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852)

                java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)

                java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)

                java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)

                java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)

                scala.collection.immutable.HashMap$Serializat","failureType":"UserError","target":"testDataflow615","errorCode":"DFExecutorUserError"

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,925 questions
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,638 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,709 questions
{count} vote

2 answers

Sort by: Most helpful
  1. Vinodh247 21,226 Reputation points
    2024-09-06T14:21:06.8566667+00:00

    Hi Jude Moore,

    Thanks for reaching out to Microsoft Q&A.

    The error you're encountering seems to be related to a missing or incorrectly referenced class in the ADF or synapse environment when using the cosmosdb analytical store as a source.

    Steps to narrow down the issue:

    • Ensure that the linked service for CosmosDB is correctly set up in both the Dataflow and the ADF pipeline. Check if all necessary permissions are properly assigned, especially for accessing the Analytical Store.
    • Check the compatibility of the CosmosDB Analytical Store version with the Synapse Dataflow. Sometimes, updates or changes in the CosmosDB SDKs or connectors may cause such issues. Ensure that the Synapse workspace is using a supported version.
    • Since you've tried new Integration Runtimes, ensure that the self-hosted or Azure IR used has access to both the CosmosDB instance and the Analytical Store. You can also try creating an Integration Runtime with higher scaling options to check if it's a performance issue.
    • Since the error mentions Spark-related classes, verify that the Spark pool in Synapse is correctly configured. You might want to redeploy the Spark pool or create a new one to ensure that all required libraries and configurations are loaded.
    • Test if recreating the Linked Service for cosmosDB in synapse helps. There may be configuration settings or connectivity issues causing the problem.
    • The error refers to a class BlobStorageContainer in the Cosmos db connector. Ensure that the connector you're using is the latest version. If not, consider updating the connectors or the ADF runtime.
    • Sometimes such issues are bugs in the platform. You might want to check the Azure updates or forums for any known issues with the CosmosDB Analytical Store and ADF/Synapse integration.
    • As a temporary workaround, if the data preview works fine with the Transactional Store, you can use that source for now while continuing to debug the Analytical Store issue. It might help narrow down whether this is a specific issue with the Analytical Store configuration.you can ignore some of the recommendations I have listed above which you might have tried already, the idea is to narrow down and then try for a fix.

    Check and let me know.

    0 comments No comments

  2. Amira Bedhiafi 24,636 Reputation points
    2024-09-06T16:11:00.35+00:00

    It is either you have a missing or incompatible CosmosDB Spark connector

    Verify the version of the Spark connector for the Analytical Store you are using and check that the Analytical Store is properly configured and synchronized with the transactional store, and confirm that your Integration Runtime is up to date and correctly configured.

    Try clearing the Dataflow cache, restarting debug sessions, and testing with a different CosmosDB instance or region.

    Otherwise try to contact Azure support they may help identify the issue.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.