In Synapse studio, Cosmos DB Dataflow starts failing the pipelines

Nikitha Koshy 35 Reputation points
2024-08-19T14:56:37.9166667+00:00

The team runs some data analytics pipelines on Synapse workspace using the data source CosmosDB. The pipeline jobs started failing a few days back with the following error. When I checked the Dataflows, I got the same error in the Data Preview blade.

The store type is Analytical

Key Details:

  • Error Message: java.lang.NoSuchMethodError: com.azure.data.cosmos.serialization.hybridrow.RowBuffer.<init>
  • Spark Version: 3.3
  • Connector Involved: Azure Cosmos DB Spark connector
at Source 'source1': an error occurred during snapshot metadata read phase - Job aborted due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent failure: Lost task 0.0 in stage 16.0 (TID 16) (vm-92f73742 executor 1): java.lang.NoSuchMethodError: com.azure.data.cosmos.serialization.hybridrow.RowBuffer.<init>(Lcosmosdb_shaded/io/netty/buffer/ByteBuf;Lcom/azure/data/cosmos/serialization/hybridrow/HybridRowVersion;Lcom/azure/data/cosmos/serialization/hybridrow/layouts/LayoutResolver;)V
	at shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.hybridrow.HybridRowObjectMapper.read(HybridRowObjectMapper.java:59)
	at shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.store.alos.ALoSFileManager.fetchRoot(ALoSFileManager.scala:443)
	at shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.store.alos.ALoSFileManager.$anonfun$rootSegment$3(ALoSFileManager.scala:175)
	at shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.store.alos.ALoSFileManager.retryHybridRowDeserialization(ALoSFileManager.scala:456)
	at shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.store.alos.ALoSFileManager.$anonfun$rootSegment$2(ALoSFileManager.scala:175)
	at shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.utils.ResourceUtils$.withResources(ResourceUtils.scala:54)
	at shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.store.alos.ALoSFileManager.rootSegment(ALoSFileManager.scala:174)
	at shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.store.alos.metadata.MapRootSegmentToFileSegmentsInfo$.processRootSegment(FileSegmentMetadata.scala:327)
	at shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.store.alos.metadata.MapRootSegmentToFileSegmentsInfo$.$anonfun$run$5(FileSegmentMetadata.scala:188)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:514)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
	at scala.collection.AbstractIterator.to(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
	at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1028)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2448)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,868 questions
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,614 questions
{count} vote

Accepted answer
  1. phemanth 10,245 Reputation points Microsoft Vendor
    2024-08-23T10:57:24.66+00:00

    @Nikitha Koshy

    I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer .

    Ask:.The team runs some data analytics pipelines on Synapse workspace using the data source CosmosDB. The pipeline jobs started failing a few days back with the following error. When I checked the Dataflows, I got the same error in the Data Preview blade.

    The store type is Analytical

    Key Details:

    • Error Message: java.lang.NoSuchMethodError: com.azure.data.cosmos.serialization.hybridrow.RowBuffer.<init>
    • Spark Version: 3.3
    • Connector Involved: Azure Cosmos DB Spark connector

    ScalaAI ConvertCopy

    at Source 'source1': an error occurred during snapshot metadata read phase - Job aborted due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent failure: Lost task 0.0 in stage 16.0 (TID 16) (vm-92f73742 executor 1): java.lang.NoSuchMethodError: com.azure.data.cosmos.serialization.hybridrow.RowBuffer.<init>(Lcosmosdb_shaded/io/netty/buffer/ByteBuf;Lcom/azure/data/cosmos/serialization/hybridrow/HybridRowVersion;Lcom/azure/data/cosmos/serialization/hybridrow/layouts/LayoutResolver;)V
    	at shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.hybridrow.HybridRowObjectMapper.read(HybridRowObjectMapper.java:59)
    	at shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.store.alos.ALoSFileManager.fetchRoot(ALoSFileManager.scala:443)
    	at shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.store.alos.ALoSFileManager.$anonfun$rootSegment$3(ALoSFileManager.scala:175)
    	at shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.store.alos.ALoSFileManager.retryHybridRowDeserialization(ALoSFileManager.scala:456)
    	at shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.store.alos.ALoSFileManager.$anonfun$rootSegment$2(ALoSFileManager.scala:175)
    	at shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.utils.ResourceUtils$.withResources(ResourceUtils.scala:54)
    	at shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.store.alos.ALoSFileManager.rootSegment(ALoSFileManager.scala:174)
    	at shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.store.alos.metadata.MapRootSegmentToFileSegmentsInfo$.processRootSegment(FileSegmentMetadata.scala:327)
    	at shaded.msdataflow.com.microsoft.azure.cosmos.analytics.spark.connector.store.alos.metadata.MapRootSegmentToFileSegmentsInfo$.$anonfun$run$5(FileSegmentMetadata.scala:188)
    	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
    	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:514)
    	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
    	at scala.collection.Iterator.foreach(Iterator.scala:943)
    	at scala.collection.Iterator.foreach$(Iterator.scala:943)
    	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
    	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
    	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
    	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
    	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
    	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
    	at scala.collection.AbstractIterator.to(Iterator.scala:1431)
    	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
    	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
    	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
    	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
    	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
    	at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
    	at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1028)
    	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2448)
    	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    	at org.apache.spark.scheduler.Task.run(Task.scala:136)
    	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
    	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
    	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    	at java.lang.Thread.run(Thread.java:750)
    
    
    

    Solution: transient issue

    I haven't made any changes, but the data flow randomly started working. I guess it was something on the Azure side because no changes were made on our end.

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

    If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.