I am happy with the results in "Speech Studio" for a sample wav file. How do I scale this up to longer files?

Question

I am happy with the results in "Speech Studio" for a sample wav file. How do I scale this up to longer files?

John Woolley 0

I have run a 1-minute wav file through the Speech Studio sample process and am pleased with the result. I can't figure out how to move forward in the system to process larger speech files. One branch seems to take me into a training setting where I learn how to write--perhaps more plausibly to copy--code for several steps. Another branch seems to take me to some place where I am supposed to specify and name "resources." But the form does not actually accept any of the names I make up for resources.

I'm not a coder.

How can it be that there is no intermediate option like the Speech Studio, that lets me generate text from longer sound files??? This is not about $$$; I have a pay-as-you-go account established.The "assistant" answer is not really helpful either because it sends me through an apparently endless number of screens about resources, network security, azure remote desktop software, etc. Jeez, I'm just trying to get text.

YutongTie-MSFT 53,976 Reputation points Moderator

2024-04-22T21:32:09.1933333+00:00

@John Woolley Thanks for reaching out to us, may I know how long is your expected input audio? Are you only looking for solutions in Azure Speech Studio but not SDK?
John Woolley 0 Reputation points

2024-04-23T18:40:32.4466667+00:00

I'm dealing with events that last for 90-120 minutes, but I can chunk them into smaller bits. I don't know anything about SDK. It looks to me like those packages (several of which certainly look interesting) assume a lot more coding skill than I have (i.e., ~zero). But they also don't seem to include speech-to-text.

2 answers

Your answer

YutongTie-MSFT 53,976 Reputation points Moderator

2024-04-22T21:32:09.1933333+00:00

@John Woolley Thanks for reaching out to us, may I know how long is your expected input audio? Are you only looking for solutions in Azure Speech Studio but not SDK?
John Woolley 0 Reputation points

2024-04-23T18:40:32.4466667+00:00

I'm dealing with events that last for 90-120 minutes, but I can chunk them into smaller bits. I don't know anything about SDK. It looks to me like those packages (several of which certainly look interesting) assume a lot more coding skill than I have (i.e., ~zero). But they also don't seem to include speech-to-text.

Answer 1

To process longer speech files in Speech Studio, you can use the batch transcription feature. This feature allows you to transcribe multiple audio files at once, up to a total of 10 hours of audio per batch. To use this feature, you will need to upload your audio files to Azure Blob Storage and provide the storage account information in Speech Studio. You can then select the batch transcription option and configure the settings for your transcription job. Once the job is complete, you can download the transcriptions in a variety of formats.

Here are the steps to use batch transcription in Speech Studio:

Upload your audio files to Azure Blob Storage.
In Speech Studio, select the "Batch transcription" option from the left-hand menu.
Provide the storage account information for your audio files.
Configure the settings for your transcription job, including language, model, and output format.
Start the transcription job.
Once the job is complete, download the transcriptions in the desired format.

Note that batch transcription is a paid feature and requires a Speech service subscription. For more information on batch transcription and pricing, see the official documentation.

References:

Batch transcription in Speech Studio

Answer 2

@John Woolley Thanks for your response, I understand you're looking for an easy way to transcribe larger audio files using the Azure Speech Studio without having to write code or configure additional resources.

While the Speech Studio provides a user-friendly interface for testing and getting transcriptions of smaller audio files, for larger files you will need to use the Azure Speech Services, which does require some coding and configuration.

Like the batch transcription API, which you mentioned you may want to chunk them into smaller audio, is supported in Speech to Text, please refer to the document here -

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/batch-transcription

For a low or no-code solution, you can use the Batch Speech to text Connector in Power Platform applications such as Power Automate, Power Apps, and Logic Apps. See the Power automate batch transcription guide to get started.

Please take a look and see how it works on your case. I hope this helps!

Regards,

Yutong

-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

Share via

I am happy with the results in "Speech Studio" for a sample wav file. How do I scale this up to longer files?

2 answers

Your answer