What is the quantity of data in order to train a custom Whisper model?

Viktoriya-6922 21 Reputation points
2024-03-27T11:43:07.0566667+00:00

I want to train a custom model with Whisper in Azure. I see available for Bulgarian the 20231026 Whisper Preview and the 20240228 Whisper Large V2.

For both models audio + transcript is the only option. What I can’t find is a detailed information about the training audio dataset.

What is the quantity of audio data that I need in order to train a custom Whisper model? 1 to 20 hours + a .txt transcription as it is for the Microsoft baseline models?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,393 questions
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-MSFT 46,646 Reputation points
    2024-03-28T20:43:59.1033333+00:00

    @Viktoriya-6922 Thanks for reaching out to us, when you mentioned train a custom model of Whisper, are you mentioning fine-tuning? Fine-tuning is retraining the base model with your data, so your data's size depends on how much you want to affect the base model. Therefore, no min input for the fine-tuning.

    When we talk about fine tuning, we really mean supervised fine-tuning not continuous pre-training or Reinforcement Learning through Human Feedback (RLHF). Supervised fine-tuning refers to the process of retraining pre-trained models on specific datasets, typically to improve model performance on specific tasks or introduce information that wasn't well represented when the base model was originally trained.

    At this moment, Azure OpenAI has no option for Whisper model, more information please check on the page here - https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#fine-tuning-models

    Regards,

    Yutong

    -Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

    0 comments No comments