Azure Data Factory Copy Activity with MySQL: Unable to read data from the transport connection

Niels van der Kaap 20 Reputation points
2024-08-12T09:07:15.2933333+00:00

We have an ADF pipeline with a copy activity that copies new and updated records from a table (100M rows) in a remote MySQL database. This is done using the following query:

select [cols] from [table] WHERE created_at between 'xxx' and 'yyy' or updated_at between 'xxx' and 'yyy'

An index is setup on the created_at and updated_at fields, to improve query performance. The linked service is configured as follows:

Screenshot 2024-08-12 104603

If we omit the where clause, the copy activity takes around 30 minutes, but runs successful. If we use the where clause, the activity returns an exception after about 5 minutes:

Activity failed because an inner activity failed; Inner activity name: Copy Table, Error: Failure happened on 'Source' side. 'Type=MySqlConnector.MySqlException,Message=Failed to read the result set.,Source=MySqlConnector,''Type=System.IO.IOException,Message=Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.,Source=System,''Type=System.Net.Sockets.SocketException,Message=An existing connection was forcibly closed by the remote host,Source=System,'

This issue appears to happen because it takes the database server too long to return the first record. We have already checked the timeout settings at the remote database server. Any thoughts on how to solve this issue?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,533 questions
0 comments No comments
{count} votes

Accepted answer
  1. NIKHILANETHIKUNTA 1,510 Reputation points Microsoft Vendor
    2024-08-30T06:06:32.7833333+00:00

    Hi @Niels van der Kaap

    I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer .

    Ask: Trying to run an Azure Data Factory (ADF) pipeline with a copy activity that transfers new and updated records from a large MySQL table. When using a WHERE clause to filter records by created at and updated at fields, the activity fails with a connection error after about 5 minutes. The issue seems to be related to the database server taking too long to return the first record. The timeout settings have been checked already on the remote database server and needs help to resolve the issue.

    Solution: Use the Azure Database for MySQL connector in ADF. This connector uses the Connector/NET library, but also allows extra connection properties. Adding the Keepalive=15 property resolves the timeouts.

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

    If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Amira Bedhiafi 22,691 Reputation points
    2024-08-13T07:33:56.98+00:00

    You are probably exceeding your max allowed packet :

    The maximum size of one packet or any generated/intermediate string, or any parameter sent by the mysql_stmt_send_long_data() C API function. The default is 64MB. The packet message buffer is initialized to net_buffer_length bytes, but can grow up to max_allowed_packet bytes when needed. This value by default is small, to catch large (possibly incorrect) packets. You must increase this value if you are using large [BLOB](https://dev.mysql.com/doc/refman/8.4/en/blob.html"13.3.4 The BLOB and TEXT Types") columns or long strings. It should be as big as the largest [BLOB](https://dev.mysql.com/doc/refman/8.4/en/blob.html"13.3.4 The BLOB and TEXT Types") you want to use. The protocol limit for max_allowed_packet is 1GB. The value should be a multiple of 1024; nonmultiples are rounded down to the nearest multiple.

    https://dev.mysql.com/doc/refman/8.4/en/server-system-variables.html#sysvar_max_allowed_packet


  2. Niels van der Kaap 20 Reputation points
    2024-08-29T09:29:39.81+00:00

    The MySQL connection uses a Self Hosted Integration Runtime behind an Azure Firewall. Since the connection to the database server does not send keep-alive messages, the firewall terminates the connection after 4 minutes. The MySQL connector uses the Connector/NET library, but does not allow extra connection properties. This means that we cannot add the Keepalive connection property.

    Workaround: Use the Azure Database for MySQL connector in ADF. This connector uses the Connector/NET library, but also allows extra connection properties. Adding the Keepalive=15 property resolves the timeouts.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.