Share via

gpt-4o-transcribe being retired end of February

Thomas 0 Reputation points
2026-02-04T08:23:41.99+00:00

gpt-4o-transcribe is set to expire on the 28th of February. There doesn't seem to be dedicated transcription models, that we can upgrade to, other than Whisper where we are getting worse transcriptions.

Will there be new release of this model or an extension to the retirement date? Otherwise, what are the viable choices that we can migrate to?

Azure AI Speech
Azure AI Speech

An Azure service that integrates speech processing into apps and services.

0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Sridhar M 5,335 Reputation points Microsoft External Staff Moderator
    2026-02-04T09:34:41.5666667+00:00

    Hi Thomas

    you're concerned about the expiration of the gpt-4o-transcribe service on February 28th and the lack of dedicated transcription models for an upgrade. Unfortunately, I don't have any information on a new release or an extension of the retirement date directly from the current documentation.

    However, here are a couple of alternatives you can consider for migration:

    1. Video Indexer: This service converts speech to text in multiple languages and can automatically identify the dominant spoken language. It's a robust option that allows you to integrate multi-language transcription into your projects. More about it can be found in the Video Indexer Documentation.
    2. Cognitive Speech Services: This is part of Azure's unified speech service that combines speech-to-text, text-to-speech, and speech translation. You can also check out the Cognitive Speech Documentation for more details.
    3. Conversation Transcription: If your application requires real-time transcription with speaker identification, this could be an option. You can learn more about it here.

    Viable migration option 1 (Recommended): Azure AI Speech‑to‑Text: For production and enterprise workloads, Azure AI Speech – Speech‑to‑Text (v3/v4) is currently the closest and most reliable replacement. It consistently outperforms Whisper for real‑world enterprise audio, supports speaker diarization, custom phrase lists, and domain adaptation, and has mature SLAs, regional availability (including India), and compliance alignment. While it is not an OpenAI model, it is the only Microsoft‑supported speech service today that closes most of the quality gap left by GPT‑4o‑Transcribe.

    Viable migration option 2: GPT multimodal (audio → text) :

    Newer GPT multimodal models can accept audio and return text, and they often provide better semantic understanding than Whisper. However, these models are not designed to be strict transcription engines. You may see paraphrasing, normalization, or missed filler words. Latency and cost are also higher. This option is best suited for semantic interpretation, not for applications that require verbatim or legally accurate transcripts.

    Viable migration option 3: Whisper with compensating techniques

    If you must remain within OpenAI ASR today, Whisper can be improved but will not reach GPT‑4o‑Transcribe parity. Practical mitigations include using larger Whisper variants, applying audio pre‑processing (noise suppression, channel balancing), intelligent chunking, and post‑processing with an LLM to clean terminology and grammar. This approach reduces pain but should be treated as a temporary mitigation, not a long‑term equivalent replacement.

    I Hope this helps. Do let me know if you have any further queries.


    If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

    Thank you!


  2. Q&A Assist
    2026-02-04T08:23:51.74+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    The gpt-4o-transcribe model is set to retire on February 28, 2026, and there is no mention of an extension for its retirement date or a new release of this specific model. As for alternatives, the whisper model is currently available, but you mentioned experiencing worse transcription results with it.

    Unfortunately, the context does not provide any additional dedicated transcription models to upgrade to beyond whisper. Therefore, your viable choices for migration appear to be limited to the existing models, including whisper and possibly other general audio transcription options like gpt-audio or gpt-realtime, which are generally available but may not specifically meet your needs for transcription quality.

    If you are looking for better transcription performance, it may be beneficial to explore the configurations and parameters available in these models to optimize their performance for your specific use case.


    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.