What is the quantity of data in order to train a custom Whisper model?

Question

What is the quantity of data in order to train a custom Whisper model?

Anonymous

I want to train a custom model with Whisper in Azure. I see available for Bulgarian the 20231026 Whisper Preview and the 20240228 Whisper Large V2.

For both models audio + transcript is the only option. What I can’t find is a detailed information about the training audio dataset.

What is the quantity of audio data that I need in order to train a custom Whisper model? 1 to 20 hours + a .txt transcription as it is for the Microsoft baseline models?

Anonymous

2024-03-29T07:41:25.1766667+00:00

Hello,

thank you for the answer. Maybe I was not expressing myself properly. In Azure Speech Studio, in the Custom Speech project I see the option "Train custom models" and there, in the drop-down menu, are listed two Whispers, as you can see in the screenshot:

As a test, I selected the 20231026 with a small dataset and the training succeeded. This is why I thought that it was possible to customize it.

Best,
Anonymous

2024-03-29T07:58:39.94+00:00

Thank for the answer YutongTie-MSFT!
YutongTie-MSFT 53,971 Reputation points Moderator

2024-04-02T01:14:10.13+00:00

@Viktoriya-6922

Thanks for your response! The custom training in Azure Speech is the same, a custom model augments the base model to include domain-specific vocabulary shared across all areas of the custom domain.

I hope my answer helps! Please let me know if you have other questions or feel free to open new thread.

Regards,

Yutong
Anonymous

2024-04-02T05:57:31.4966667+00:00

Can you tell me why my "training" with the 20231026 succeeded?

Thank you Yutong!

1 answer

Your answer

Anonymous

2024-03-29T07:41:25.1766667+00:00

Hello,

thank you for the answer. Maybe I was not expressing myself properly. In Azure Speech Studio, in the Custom Speech project I see the option "Train custom models" and there, in the drop-down menu, are listed two Whispers, as you can see in the screenshot:

As a test, I selected the 20231026 with a small dataset and the training succeeded. This is why I thought that it was possible to customize it.

Best,
Anonymous

2024-03-29T07:58:39.94+00:00

Thank for the answer YutongTie-MSFT!
YutongTie-MSFT 53,971 Reputation points Moderator

2024-04-02T01:14:10.13+00:00

@Viktoriya-6922

Thanks for your response! The custom training in Azure Speech is the same, a custom model augments the base model to include domain-specific vocabulary shared across all areas of the custom domain.

I hope my answer helps! Please let me know if you have other questions or feel free to open new thread.

Regards,

Yutong
Anonymous

2024-04-02T05:57:31.4966667+00:00

Can you tell me why my "training" with the 20231026 succeeded?

Thank you Yutong!

Answer 1

@Viktoriya-6922 Thanks for reaching out to us, when you mentioned train a custom model of Whisper, are you mentioning fine-tuning? Fine-tuning is retraining the base model with your data, so your data's size depends on how much you want to affect the base model. Therefore, no min input for the fine-tuning.

When we talk about fine tuning, we really mean supervised fine-tuning not continuous pre-training or Reinforcement Learning through Human Feedback (RLHF). Supervised fine-tuning refers to the process of retraining pre-trained models on specific datasets, typically to improve model performance on specific tasks or introduce information that wasn't well represented when the base model was originally trained.

At this moment, Azure OpenAI has no option for Whisper model, more information please check on the page here - https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#fine-tuning-models

Regards,

Yutong

-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

Share via

What is the quantity of data in order to train a custom Whisper model?

1 answer

Your answer