Hello Vineyard, Mr. Scotty,
Welcome to the Microsoft Q&A forum.
SHIR sends heartbeat to ADF service every 30 seconds. If heartbeats are lost for 100 seconds, SHIR will treat SHIR as unhealthy and show offline in portal.
If heartbeats are lost for 3 minutes, ADF will stop queuing activity to SHIR queue to avoid queue overflow and activity is failed with this error.
This can be caused by machine's CPU, memory or network. As SHIR is managed by customer instead of Microsoft, customer needs to make sure the availability of machine hosting SHIR.
Here are the Possible solutions:
- Check any connectivity issue or VM crash. If yes, escalate to your network team or VM team.
- Check any high CPU/low memory issue caused by activity runs during the downtime. If yes, check whether it is expected and scale up machine performance if necessary.
- If SHIR only has one node, enable SHIR high availability (https://docs.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime?tabs=data-factory#high-availability-and-scalability ) to prevent from such single point failure. (No downtime during registration/addition of new node)
I hope this helps. Please let me know if you have any further questions.