Share via

Microsoft-hosted agents permanently stuck — orphaned job requests from canceled builds cannot be cleared

Peyman Dinani 25 Reputation points
2026-03-02T04:59:27.51+00:00

Problem

All Microsoft-hosted pipelines in our Azure DevOps organization are completely blocked. No new pipeline runs can acquire an agent — they sit at "Acquiring an agent from the cloud" indefinitely. This has been ongoing for 3+ hours with no self-recovery.

We have 1 free Microsoft-hosted parallel job with 1800 min/month and only 41 minutes used. Billing and parallelism are not the issue.

Root Cause

We traced this to orphaned job requests stuck in the agent pool queue. These requests belong to builds that were successfully canceled (status: completed, result: canceled), but their final stages — which used condition: always() — were never terminated by the orchestrator. The job requests for those stages remain in the queue with no result and no way to clear them.

The deadlock cycle

  1. Several pipeline runs were canceled via UI and REST API
  2. Builds transitioned to completed/canceled at the build level
  3. However, final stages with condition: always() remained state: inProgress in the timeline — the orchestrator queued agent jobs for them even though the build was canceled
  4. The only agent in the pool is offline/Deallocated (normal for hosted — it provisions on demand)
  5. The dispatcher assigns orphaned requests to this offline agent with a ~45-minute lease
  6. Lease expires, dispatcher moves to the next orphaned request
  7. With multiple orphaned requests cycling, the single parallelism slot is blocked indefinitely
  8. No new agent VMs are ever provisioned

Key observations

  • resourceusage API shows usedCount: 0 — billing sees no active jobs
  • But the job dispatcher considers the slot occupied by the orphaned lease
  • One request was assigned to the agent 71 minutes after its parent build was already canceled
  • jobCancelTimeoutInMinutes: 5 on the pipeline definition is not honored for stages waiting for agent provisioning

What we tried (everything fails)

Action Result
Cancel builds via UI Build shows canceled, but stuck stage remains inProgress
Cancel builds via REST API (PATCH status=cancelling) 200 OK, but stage jobs not released
Force complete builds via API (PATCH status=completed) 200 OK, but job requests persist
Delete builds via UI Blocked — "has active jobs"
Delete builds via API 403 Forbidden
DELETE or PATCH job requests via API 405 Method Not Allowed
Delete agent pool via UI Blocked — active jobs
Delete the offline agent via API 403 Forbidden
Disable/re-enable the agent No effect on queue
Disable/re-enable the pool No effect on queue
PATCH timeline records to force-complete stages 405 Not Supported
POST JobCompleted event to orchestration plan Requires agent-scoped token, not available to admins
Switch pipelines to a different hosted pool name All hosted pools share the same dispatcher and parallelism slot
Wait for lease expiry Lease expires but dispatcher just cycles to next orphaned request
Set parallelism to 0 and back to 1 API returns 405 on pool modification

There is no administrator-accessible way to clear orphaned job requests. The entire organization's Microsoft-hosted pipelines are dead with no self-service recovery path.

Bugs identified

  1. Canceled builds should release all pending job requests immediately. When a build transitions to completed/canceled, any queued job requests should be terminated. Currently they persist indefinitely.
  2. The dispatcher should not assign jobs for canceled builds. We observed a job assigned 71 minutes after its parent build was canceled.
  3. No API exists to cancel orphaned job requests. DELETE and PATCH on _apis/distributedtask/pools/{poolId}/jobrequests/{requestId} both return 405. Organization admins have zero ability to clear stuck requests without Microsoft intervention.
  4. Stages with condition: always() create an unrecoverable deadlock when canceled. The stage waits for an agent, the agent won't provision because the build is canceled, the job request can't be released because the stage is "in progress", and jobCancelTimeoutInMinutes doesn't apply to stages waiting for provisioning.
  5. No circuit breaker exists. A single bad cancellation can permanently block all Microsoft-hosted pipelines for an entire organization with no timeout or automatic recovery.

Prevention

We have since changed our pipeline's notification stage from condition: always() to condition: not(canceled()) to prevent this from recurring. However, the current deadlock remains unresolvable without Microsoft support.

Ask

  • Immediate: Is there any way for organization admins to clear orphaned job requests from a Microsoft-hosted pool? We have exhausted every API endpoint and UI option.
  • Long-term: Please add an API or UI option to force-cancel job requests, and fix the orchestrator to properly clean up jobs when builds are canceled.

Environment

  • Azure DevOps Services (cloud)
  • Free tier, 1 Microsoft-hosted parallel job
  • Multi-stage YAML pipelines
Azure DevOps
{count} votes

1 answer

Sort by: Most helpful
  1. Siddhesh Desai 4,030 Reputation points Microsoft External Staff Moderator
    2026-03-02T05:21:38.7766667+00:00

    Hi @Peyman Dinani

    Thank you for reaching out to Microsoft Q&A.

    There was an outage on Azure Devops end and the services are now restored, You will be able to generate Git Credentials now.

    Refer: https://status.dev.azure.com/_history

    If the resolution was helpful, kindly take a moment to click on 210246-screenshot-2021-12-10-121802.pngand click on Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.