ADF dataflow activity fails with large record sets

Question

I have dataflow activity that retrieves data from an Azure MySQL database and invokes an address API using external call transformation to update the database with standardized addresses. However, when dealing with a large number of records (approximately 12,000), the dataflow activity fails, with the an error message. Smaller record sets process successfully without any issues. I suspect that the problem lies in the sink activity, although I haven’t been able to identify the exact root cause.

Error Message:

Operation on target Address Validation failed: {"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: at Sink 'SinkLocationStage': Communications link failure during rollback(). Transaction resolution unknown.","Details":"java.sql.SQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown. at shaded.msdataflow.com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:110) at shaded.msdataflow.com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97) at shaded.msdataflow.com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89) at shaded.msdataflow.com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63) at shaded.msdataflow.com.mysql.cj.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:1856) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:727) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:856) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1$adapted(JdbcUtils.scala:854) at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1027) at org.apache.spark.rdd.RDD.$an"}

Accepted Answer

@Achu A

Welcome to Microsoft Q&A platform and thanks for posting your question.

I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others "I'll repost your solution in case you'd like to accept the answer.

**Ask:**I have dataflow activity that retrieves data from an Azure MySQL database and invokes an address API using external call transformation to update the database with standardized addresses. However, when dealing with a large number of records (approximately 12,000), the dataflow activity fails, with the an error message. Smaller record sets process successfully without any issues. I suspect that the problem lies in the sink activity, although I haven’t been able to identify the exact root cause.

Error Message:

Operation on target Address Validation failed: {"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: at Sink 'SinkLocationStage': Communications link failure during rollback(). Transaction resolution unknown.","Details":"java.sql.SQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown. at shaded.msdataflow.com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:110) at shaded.msdataflow.com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97) at shaded.msdataflow.com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89) at shaded.msdataflow.com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63) at shaded.msdataflow.com.mysql.cj.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:1856) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:727) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:856) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1$adapted(JdbcUtils.scala:854) at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1027) at org.apache.spark.rdd.RDD.$an"}

Solution: Using Dynamic range partition in the source transformation fixed the issue for me.

If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.

Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

Share via

ADF dataflow activity fails with large record sets

0 additional answers