Share via

Azure OpenAI Global Batch — jobs stuck in validating status for 4+ days across multiple regions

Diksha Golait 20 Reputation points Microsoft Employee
2026-04-24T20:11:55.3933333+00:00

Since April 19, 2026, Azure OpenAI Global Batch jobs are getting stuck in the validating status indefinitely. They never transition to in_progress, despite the documented 24-hour completion_window. As of April 24, I have ~60 jobs in validating across two separate Azure OpenAI resources in different regions, with the oldest sitting at ~113 hours.

Environment

  • Regions affected: Sweden Central and East US 2
  • Deployments: gpt-4.1-mini-batch, gpt-5.1-batch
  • API version: 2024-10-01-preview
  • SDK: OpenAI Python client (openai), client.batches.create / .list / .retrieve
  • Input format: JSONL on Azure Blob, referenced via input_blob (BYOS global batch flow)

Both resources were healthy before April 19 and do still occasionally complete a job — throughput has dropped from ~30 completions/day to ~4/day per resource.

Can someone from the Azure OpenAI team help unblock these jobs please?User's image

Azure OpenAI in Foundry Models

Answer accepted by question author

  1. Karnam Venkata Rajeswari 2,795 Reputation points Microsoft External Staff Moderator
    2026-04-24T20:34:46.2733333+00:00

    Hello @Diksha Golait ,

    Welcome to Microsoft Q&A .Thank you for reaching out to us.

    The behavioiural pattern where Global Batch jobs remain in the “validating” state for extended durations across multiple regions with reduced but non-zero throughput — is consistent with a service-side validation queue delay rather than an issue related to input format, configuration, or SDK usage.

    This pattern typically indicates backend capacity constraints or processing backlog, where limited validation capacity remains available. As a result, a small number of jobs continue to complete while the majority stay queued.

    The following actions can help maintain partial throughput and assess recovery

    1. Submitting smaller batch jobs
      1. Split large JSONL files into smaller datasets
      2. Smaller workloads are more likely to move through validation under constrained conditions
    2. Testing with a small new batch
      1. Consider submitting a lightweight batch job
      2. Then observe whether it transitions to in_progress to validate system behavior
    3. Staggering job submissions
      1. Please avoid submitting multiple batches simultaneously
      2. Introduce intervals between submissions to reduce queue contention
    4. If available test alternate region or deployment
      1. Submitting a small workload in another supported region
      2. This helps identify whether impact is localized or broader
    5. Selective cancellation only for long-stuck jobs
      1. Jobs stuck for extended periods (for example, beyond 48–72 hours) are unlikely to progress
      2. If required, please cancel a limited subset and re-submit as smaller batches
      3. Please avoid bulk cancellation, as it may increase queue pressure

    Jobs that remain in the “validating” state for multiple days typically do not progress further through client-side actions. Resolution in such cases generally requires backend intervention.

    For Monitoring and visibility , the Azure service status page may be reviewed periodically; however, partial degradations may not always be reflected.

    Diagnostic metrics such as validation duration or queue trends can provide visibility but do not unblock existing jobs

    The following references might be helpful , please check them out

     

    Thank you

    Was this answer helpful?

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Diksha Golait 20 Reputation points Microsoft Employee
    2026-04-29T23:02:13.4733333+00:00

    thanks Karnam! Some of my batch jobs passed validation stage after being stuck for ~5 days. I think you are right, it was because the foundry backend gets resource-constrained at times. Thanks for all the suggested actions, I will try to utilize them from now on to keep this under control.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.