ADF - Data flow: Activity timed out error

P Sharma 1 Reputation point
2022-06-23T12:42:28.29+00:00

Hi Team,

We were earlier getting the time out error only while running the pipeline in data flow which copying the data from source to sink

But after changing the core size from 4 to 8 cores and compute type to memory optimized from general, now we can see the below error.
Could you please advise

During the debugging mode, it works totally fine we are able to see the data.

{"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: An I/O error occurred while sending to the backend.","Details":"shaded.msdataflow.org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.\n\tat shaded.msdataflow.org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:339)\n\tat shaded.msdataflow.org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:448)\n\tat shaded.msdataflow.org.postgresql.jdbc.PgStatement.execute(PgStatement.java:369)\n\tat shaded.msdataflow.org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:153)\n\tat shaded.msdataflow.org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:103)\n\tat org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:304)\n\tat org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:347)\n\tat org.apache.spark.rdd.RDD.iterator(RDD.scala:311)\n\tat org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)\n\tat org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:347)\n\tat org.apache.spark.rdd.RDD.iterator(RDD.scala:311)\n\tat org.apache.spark.sch"}

214386-image.png

Pipeline run ID
d9a326b2-d9b5-48d8-b0ef-62f684ae62a6

thanks

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
{count} votes

2 answers

Sort by: Most helpful
  1. P Sharma 1 Reputation point
    2022-06-23T16:33:09.44+00:00

    Hi Team,

    We ran the pipeline again, this time it got failed with the time out error only with no other details

    Same timed out error we are facing since last many runs, and we are not able to find solution for this

    At the source side DB, we ran the query it is working fine ---- no issues
    ADF debug mode, it returns the records on preview

    Pipeline run ID
    f41a1a84-be42-4da3-9cd1-a7e8a08c7b14

    214348-image.png

    Please help

    thanks

    0 comments No comments

  2. MartinJaffer-MSFT 26,236 Reputation points
    2022-06-24T21:18:55.807+00:00

    Hello @P Sharma and welcome to Microsoft Q&A.

    I have checked for similar historical cases (based upon error message).

    shaded.msdataflow.org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend

    The cases I found talked about types of connection pooling, and having too many connections open on the postgres.
    What this means is, too many users/applications were trying to access the database at the same time, and the multitasking made operations take too long. When operations took too long, the connection was dropped. Since the connection was dropped, it couldn't send data to the back end.

    What I find interesting, is that the Data Flow debug worked fine. If your Data Flow writes back to the same DB you read from, I havea theorey. The Data Flow debug preview does not write back to the DB, it only reads the data, and then caches the source data. This way it doesn't have to fetch the source data multiple times. On a database, Write operations usually take more effort than read operations. So thats my theorey on why the DataFlow debug preview had less trouble.

    There might be other causes, so after you check your number of peak connections, I can offer you a 1-time free support ticket for more in-depth investigation.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.