ADF dataflow activity fails with large record sets

Achu A 50 Reputation points
2024-05-07T13:41:07.27+00:00

I have dataflow activity that retrieves data from an Azure MySQL database and invokes an address API using external call transformation to update the database with standardized addresses. However, when dealing with a large number of records (approximately 12,000), the dataflow activity fails, with the an error message. Smaller record sets process successfully without any issues. I suspect that the problem lies in the sink activity, although I haven’t been able to identify the exact root cause.

Error Message:

Operation on target Address Validation failed: {"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: at Sink 'SinkLocationStage': Communications link failure during rollback(). Transaction resolution unknown.","Details":"java.sql.SQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown.\n\tat shaded.msdataflow.com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:110)\n\tat shaded.msdataflow.com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)\n\tat shaded.msdataflow.com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89)\n\tat shaded.msdataflow.com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63)\n\tat shaded.msdataflow.com.mysql.cj.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:1856)\n\tat org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:727)\n\tat org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:856)\n\tat org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1$adapted(JdbcUtils.scala:854)\n\tat org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1027)\n\tat org.apache.spark.rdd.RDD.$an"}

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,700 questions
{count} votes

Accepted answer
  1. phemanth 6,550 Reputation points Microsoft Vendor
    2024-05-12T18:11:47.47+00:00

    @Achu A

    Welcome to Microsoft Q&A platform and thanks for posting your question.

    I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others "I'll repost your solution in case you'd like to accept the answer.

    **Ask:**I have dataflow activity that retrieves data from an Azure MySQL database and invokes an address API using external call transformation to update the database with standardized addresses. However, when dealing with a large number of records (approximately 12,000), the dataflow activity fails, with the an error message. Smaller record sets process successfully without any issues. I suspect that the problem lies in the sink activity, although I haven’t been able to identify the exact root cause.

    Error Message:

    Operation on target Address Validation failed: {"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: at Sink 'SinkLocationStage': Communications link failure during rollback(). Transaction resolution unknown.","Details":"java.sql.SQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown.\n\tat shaded.msdataflow.com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:110)\n\tat shaded.msdataflow.com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)\n\tat shaded.msdataflow.com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89)\n\tat shaded.msdataflow.com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63)\n\tat shaded.msdataflow.com.mysql.cj.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:1856)\n\tat org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:727)\n\tat org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:856)\n\tat org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1$adapted(JdbcUtils.scala:854)\n\tat org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1027)\n\tat org.apache.spark.rdd.RDD.$an"}

    Solution: Using Dynamic range partition in the source transformation fixed the issue for me.

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

    If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

0 additional answers

Sort by: Most helpful