I am getting below error while trying to extract the data from Dynamics using source in mapping data flow. But its working fine with copy activity.

Nikita Randive 20 Reputation points
2023-03-06T05:16:37.41+00:00

Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 3, vm-09281300, executor 1): com.microsoft.dataflow.Issues: DF-Rest_015 - Failure to read most recent page request: Request Failure(URL:https://xxxxx/xxx/xxx/xxx/xxx); Error Message: Read timed out; java.net.SocketTimeoutException: Read timed out at com.microsoft.dataflow.Utils$.failure(Utils.scala:76) at org.apache.spark.sql.execution.datasources.rest.RestClient$$anonfun$readResourcesWithDynamicPaging$1.apply(RestClient.scala:61) at org.apache.spark.sql.execution.datasources.rest.RestClient$$anonfun$readResourcesWithDynamicPaging$1.apply(RestClient.scala:49) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.rest.RestClient.readResourcesWithDynamicPaging(RestClient.scala:49) at org.apache.spark.sql.execution.datasources.rest.RestClient.readResources(RestClient.scala:27) at org.apache.spark.sql.execution.datasources.rest.RestRDD.compute(RestRDD.scala:20) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:347) at org.apache.spark.rdd.RDD.iterator(RDD.scala:311) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:347) at org.apache.spark.rdd.RDD.iterator(RDD.scala:311) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:347) at org.apache.spark.rdd.RDD.iterator(RDD.scala:311) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:347) at org.apache.spark.rdd.RDD.iterator(RDD.scala:311) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:347) at org.apache.spark.rdd.RDD.iterator(RDD.scala:311) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:347) at org.apache.spark.rdd.RDD.iterator(RDD.scala:311) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$12.apply(Executor.scala:414) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:420) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
6,692 questions
{count} votes

1 answer

Sort by: Most helpful
  1. AnnuKumari-MSFT 21,321 Reputation points Microsoft Employee
    2023-03-09T06:26:12.4833333+00:00

    @Nikita Randive ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your question here.

    From the error message, it seems the issue is related to timeout while fetching the data from the source, i.e. Dynamics 365 datastore.

    Kindly try to increase the resources available to the Spark cluster, such as the number of cores or memory and see if it helps.

    Note: SHIR is not supported in Dataflow. Kindly use AutoResolve IR.


    Hope it helps. Kindly let me know if the above suggestion helped. Thanks