Speech Studio Custom Model - Large Plain Text Dataset

evisnetnim-2 1 Reputation point
2021-08-18T09:33:27.123+00:00

Hi,

I have a fairly large dataset of plain text names (300,000+) which I'd like to train a custom model for. I don't have pronunciation or audio form of the names, only plain text.
My previous attempts to train a custom model with a sample of these names failed to deliver better results over the base model.

What would be the best way to go about this, assuming that creating the audio version of all the names is time and cost prohibitive?

Thank you

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,440 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 42,781 Reputation points Microsoft Employee
    2021-08-18T13:35:45.97+00:00

    @evisnetnim-2 I believe you tried to create a custom model for voice using some of the transcripts and corresponding audio but the model created from training did not end up as satisfactory.

    If the base models or neural voices are something that works, you can try the audio content creation tool from the studio even with large data. Since you have names as your data in plain text you can calculate the number of characters from all these files and then paste them in the audio content creation tool in text format where the limitation is of 20000 characters in one file.

    If we assume each name has around 10 characters then each file in the tool would contain 2000 names and this needs to be repeated for 150 times. Once the names are pasted in one file you could hit the play button this will create audio for all the text in the file. Once this action is complete you can export the files.

    124200-image.png

    During the export there is an option to export to local disk, choose the following options and download.

    124288-image.png

    This should download a zip file with the audio files, plain text, SSML and summary.
    124297-image.png

    The audio files should be in the order in which you pasted them in the file and each file corresponds to an audio of the name.
    124313-image.png
    You can now rename these files with the same names as pasted in the audio content creation tool. You can create a script to rename these files from the audio folder. I think this SO post could help to create one as the question is similar to this scenario.

    You can now repeat this scenario for another set of names. The process is still a bit time consuming but if you can get this to work the first time it should not take a lot of time to complete the rest.

    There could be a limitation with the pricing tier of your speech service if you exceed the number of characters as per plan and the availability of some neural voices would need approval from the speech service team for your subscription.

    I hope this helps.