Azure dev ops pipelines jobs not starting with self-hosted agent

Lucas Snijder 60 Reputation points
2025-05-12T08:21:06.74+00:00

I'm having trouble with Azure Pipelines—jobs never start, even though everything seems correctly configured. I'm using a Managed DevOps Pool (cicd-pipeline-devops-pool2) with Ubuntu 22.04. The YAML pipeline is valid, runs on Python 3.11, installs dependencies, and uses pytest. Jobs get queued but hang indefinitely with logs like “We are provisioning an agent…” and “Pool provider seems slow…”, until they time out.

What I’ve checked:

  • YAML triggers correctly; branch and task setup (UsePythonVersion, venv, pytest) are fine.
  • No custom demands (e.g., Agent.OSVersion) are present.
  • I’ve avoided vmImage because I don’t have parallelism for Microsoft-hosted agents (request pending for 3+ weeks).

On the Managed Pool side:

  • Manual scaling window (Wed 12–16h) with 100% standby agents.
  • Azure Portal shows agents going Ready → Allocated → PendingReturn → Returned, often without jobs running.
  • Sometimes a job is allocated to an agent but never progresses.
  • No manual cancellations. No hard errors. No PoolProviderTimeout.

What I suspect:

  • Agents are provisioned, but jobs don’t bind in time or agents are retired too quickly.
  • Could still be a hidden capability mismatch, but I can’t view agent capabilities in a Managed Pool.

Looking for:

  • Any known issues with agent/job orchestration in Managed Pools?
  • Whether auto-scaling is more reliable than manual standby?
  • A way to inspect reported agent capabilities in a Managed Pool?
  • And how to escalate parallelism approval for Microsoft-hosted agents?

Any help would be greatly appreciated.

Azure DevOps
{count} votes

1 answer

Sort by: Most helpful
  1. Gaurav Kumar 785 Reputation points Microsoft External Staff Moderator
    2025-05-23T14:06:41.17+00:00

    Hi @Lucas Snijder ,

    Jobs are hanging with messages like “We are provisioning an agent…” and agents enter the Allocated → PendingReturn → Returned cycle without executing jobs. In Managed DevOps Pools, this behavior typically stems from agent-job binding delays, premature agent retirement, or implicit capability mismatches, even if no custom demands are set.

    Try the below workaround to resolve the issue:

    Prefer Auto-Scaling Over Manual Standby

    Manual standby often leads to agents aging out before binding, especially under light or bursty workloads. Switch to auto-scaling mode, which dynamically keeps agents alive long enough for job assignment to complete.

    Check Network Connectivity

    Provisioning may hang if the agent can't reach Azure DevOps servers. Ensure there are no firewall rules, NSGs, or outbound restrictions preventing the agent from accessing required Azure DevOps endpoints.

    Refer this MS documentation :Restricting outbound connectivity

    Add a Diagnostic Job to Inspect Agent Environment:

    Use this job to check what the agent sees at runtime:

    jobs:
    - job: checkAgent
      timeoutInMinutes: 5
      pool:
        name: cicd-pipeline-devops-pool2
      steps:
        - script: |
            echo "Agent name: $(Agent.Name)"
            echo "OS: $(Agent.OS)"
            echo "Capabilities:"
            env | sort
          displayName: 'Dump agent capabilities'
    

    It'll help to verify if key tools like python, pip, or pytest are actually visible to the agent.

    Force Binding with a Warm-Up Job

    Sometimes, hidden requirements block scheduling. Add a no-op job to test basic job binding:

    jobs:
    - job: warmup
      timeoutInMinutes: 2
      pool:
        name: cicd-pipeline-devops-pool2
      steps:
        - script: echo "Agent bound successfully"
    

    If this job doesn’t run, the issue is at the agent/job binding layer, not with the pipeline itself.

    Check Agent Lifecycle in the DevOps Portal

    From Project Settings → Agent Pools → cicd-pipeline-devops-pool2 → Agents tab, verify:

    • Agents show Ready for long enough before returning
    • There are no frequent transitions between Allocated → Returned without jobs
    • Agents are not stuck in Offline, Unhealthy, or Unknown states

    Managed Pools don’t expose agent capabilities, so ensure your base image includes Python 3.11 and related tools:

    sudo apt update && sudo apt install -y python3.11 python3.11-venv python3-pip
    

    Hope it helps!


    Please do not forget to click "Accept the answer” and Yes wherever the information provided helps you, this can be beneficial to other community members.

    If you have any other questions or still running into more issues, let me know in the "comments" and I would be happy to help you.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.