Share via

Azure DevOps Pipeline Failing Due to Agent Timeout on Long-Running Job

Swahela Mulla 100 Reputation points
2026-05-13T10:14:39.9+00:00

Hi Everyone,

I’m working on an Azure DevOps pipeline where I’m creating an image that includes multiple application installations. Since there are several apps involved, the process takes a considerable amount of time.

In the pipeline YAML, I have already increased the timeout setting as below:

timeoutInMinutes: 420

Despite this, the pipeline consistently fails after approximately 1 hour 20 minutes (~80 minutes) with the following error:

##[error]We stopped hearing from agent Azure Pipelines.
Verify the agent machine is running and has a healthy network connection.
Anything that terminates an agent process, starves it for CPU, or blocks its network access can cause this error.

Details:

  • Long-running task due to multiple application installations during image creation
  • Pipeline timeout explicitly set to 420 minutes
  • Failure still occurs around ~80 minutes
  • Error suggests loss of communication with the agent

Questions:

  • Has anyone experienced similar agent timeout issues during long-running pipelines?
  • Is there any agent-level or hidden timeout limit that might override the pipeline setting?
  • Would switching to a self-hosted agent help in this scenario?
  • Are there best practices to handle long-running installations in image build pipelines (e.g., splitting tasks, optimization, etc.)?

Any help or guidance would be greatly appreciated.

Thanks!


Azure DevOps

2 answers

Sort by: Most helpful
  1. Pravallika KV 16,860 Reputation points Microsoft External Staff Moderator
    2026-05-13T13:07:27.8766667+00:00

    Hi @Swahela Mulla ,

    Thanks for reaching out to Microsoft Q&A.

    It looks like you’re running into two related limits on Microsoft-hosted agents:

    1. Hosted-agent built-in timeouts
      • On a private project with the free tier, each job can only run for 60 minutes.
      • If you’ve purchased parallel jobs, that limit goes up to 360 minutes (6 hours).
      • Setting timeoutInMinutes: 420 in your YAML won’t help if the hosted-agent cap is still 360 minutes—or 60 minutes if you’re on the free tier.
    2. Agent “heartbeat” disconnects
      • The agent process sends a heartbeat every minute.
      • If the server doesn’t hear back for 5 consecutive minutes, it marks the job as failed with “We stopped hearing from agent…”.
      • Long-running install scripts that monopolize CPU, I/O or suppress all output for more than a few minutes can starve the agent’s heartbeats and trigger a hard disconnect—even if your job timeout is set high.

    What you can try:

    Verify your hosted-agent limits

    • Are you on a private repo using the free grant? That’ll cut you off at 60 minutes no matter what.
      • If you need longer on MS-hosted, you must purchase extra parallel jobs to unlock the full 360 minutes per job.

    Add “keep-alive” output

    • Break your big script into smaller steps or periodically echo progress so the agent thread doesn’t starve.
      • You can even insert a snippet like while someLongInstall; do echo "$(date): still working"; sleep 240; done.

    Split or optimize your pipeline

    • Pre-bake your images (for example, with Packer or Azure Image Builder) outside of DevOps so the pipeline just picks up a finished VM image.
      • If you must install in-pipeline, split into multiple jobs/tasks, each staying well under your hosted-agent cap.

    Consider self-hosted agents

    • No built-in timeouts. You decide how long a job can run.
      • You can dedicate machines with enough CPU, RAM and disk so your install steps won’t starve the agent process.

    References:

    Hope this helps!


    If the resolution was helpful, kindly take a moment to click on User's imageand click on Yes for was this answer helpful. And, if you have any further query do let us know.

    Was this answer helpful?


  2. AI answer

    2026-05-13T10:15:11.46+00:00

    The error indicates that Azure Pipelines “stopped hearing from the agent,” which is different from a normal job timeout. It means the agent process stopped sending heartbeats for at least five minutes, often because the machine is overloaded, the process is blocked, or there is a network issue.

    Key behaviors from the platform:

    • Each job has a timeout and a separate cancel timeout. If the job exceeds its timeout, the server cancels it and gives a grace period (cancel timeout) for cleanup. After job timeout + cancel timeout, if the agent still hasn’t reported completion, the job is marked as failed.
    • The agent sends a heartbeat every minute. If the server doesn’t receive a heartbeat for five consecutive minutes, it assumes the agent is gone and marks the job as a failure with the “We stopped hearing from agent” message.
    • For Microsoft-hosted agents, there are hard maximum job lengths (60 minutes for private repos, 360 minutes for public repos) that cannot be exceeded even if timeoutInMinutes is higher.

    Relevant limits and configuration:

    • For YAML jobs, timeoutInMinutes controls how long the job can run before the server cancels it. 0 means “use the maximum limit” for the agent type.
    • cancelTimeoutInMinutes controls how long the job is allowed to run during cancellation before being forcibly stopped.
    • On Microsoft-hosted agents, increasing timeoutInMinutes beyond the hosted-agent maximum has no effect.

    For long-running image builds and app installations, the following approaches are recommended:

    1. Ensure the job timeout is configured correctly
      • In YAML, set timeoutInMinutes high enough for the full image build, but within the hosted-agent limits:
             jobs:
             - job: BuildImage
               timeoutInMinutes: 360
               cancelTimeoutInMinutes: 10
        
      • If using classic pipelines, configure the job timeout on the job and/or pipeline options. Remember that for Microsoft-hosted agents, the platform still enforces the 60/360-minute limits.
    2. Use a self-hosted agent for very long or heavy builds
      • Self-hosted agents can run jobs “forever” when timeoutInMinutes: 0 is used, because they are not constrained by the Microsoft-hosted agent maximum job length.
      • This is the primary way to avoid hosted-agent timeouts for long-running image creation and large app installations.
    3. Check for resource starvation and agent responsiveness
      • The “stopped hearing from agent” message often means the agent machine is out of CPU, memory, or disk, or is otherwise unresponsive.
      • Enable verbose logs and review the “Agent environment resources” entries to see disk, memory, and CPU usage per step. If these show high utilization, increase the agent’s resources or optimize the workload.
      • Ensure the agent machine has a stable network connection and is not being shut down, suspended, or heavily throttled during the long-running installation.
    4. Consider splitting or optimizing long-running tasks
      • Break the image build into multiple steps or jobs where possible (for example, base image creation vs. app-specific layers) so that each job stays within a manageable runtime.
      • For Azure VM Image Builder scenarios, if the build is hitting the 60-minute limit on private projects with Microsoft-hosted agents, either:
        • Host a self-hosted agent, or
        • Reduce the time of the build (for example, fewer regions, asynchronous replication) as suggested in the VM Image Builder troubleshooting guidance.
    5. When using Microsoft-hosted agents and hitting timeouts
      • For private repos/projects, free Microsoft-hosted agents are limited to 60 minutes per job.
      • To increase this, either:
        • Purchase Microsoft-hosted capacity (which raises the limit to 360 minutes), or
        • Switch to a self-hosted agent.
      • Also verify that the pipeline’s own timeout is set larger than the agent’s max timeout; otherwise, the pipeline-level timeout may cancel the job earlier.

    In summary, for long-running image creation and app installations:

    • Use timeoutInMinutes appropriately but recognize the hard limits on Microsoft-hosted agents.
    • Prefer self-hosted agents for very long or resource-intensive builds.
    • Monitor resource utilization and ensure the agent remains responsive (no CPU/memory starvation and no network loss).
    • Split and optimize installation steps where possible to keep individual jobs within stable runtime windows.

    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.