I believe this is due to slow network connection between DB that runs the SSIS, each of the linked DB in the SSIS job design. If SSIS is diff machine, and there are 2 other DB involved, then there are total of 3 machine to check their network connectivity
Suggestion:
- In each machine, checks "netstat -s" for each NIC card, and ensures no increase in "errors," "discard," "failure" related metrics
E.g. if server uses IPv4, then focus in following section of netstat output
IPv4 Statistics
Packets Received = 8799877
Received Header Errors = 0
Received Address Errors = 0
Datagrams Forwarded = 0
Unknown Protocols Received = 0
Received Packets Discarded = 33505
Received Packets Delivered = 4614424
Output Requests = 4192937
Routing Discards = 0
Discarded Output Packets = 3145
Output Packet No Route = 45
Reassembly Required = 24
Reassembly Successful = 12
Reassembly Failures = 0
Datagrams Successfully Fragmented = 12
Datagrams Failing Fragmentation = 0
Fragments Created = 24
- Captures before and after in each machine to narrow down which machine has network communication issue
- Ensure all 3 machine has sufficient CPU, RAM, disk
- Ensure all 3 machine has no virtual memory paging to disk
- Creates simpler SELECT INTO <TABLE-Target> FROM <TABLE-Source> across 2 DB with table size that is 10MB, 100MB, 1MB, and determine how sensitive and frequent to simulate the network instability/delay issue
- Creates long running SQL query that takes 5, 10, 15, 30, 1 hr to return its result back into SSIS. Such as SELECT COUNT(1) FROM <Big-Table> WHERE <time consuming filter> to simulate the TCP low level session force-close configuration along the network
Under extreme case, option:
- Moves all 3 machines to the same network. This will eliminates many hops, e.g. VPN, firewall, router, switches
- Moves all 3 machines to the same physical network switch
- Installs all 3 DB into the same machine. This will eliminate entire network all together, and don't bother with any network equipment and impact