How to transcribe pounds as currency

Question

Hi all,

I'm creating an app using speech recognition from Azure's Speech Studio. I've successfully uploaded bespoke training data and created a couple of endpoints, but I can't figure out how to force the API to transcribe, say £2,316 instead of 2316 pounds. I'm creating financial apps for a UK client, so 99.9% of the time, "pounds" will mean currency.

Can anyone help at all please?

Thanks so much

Answer

Hello George Kent

As per mu knowledge, Currently, there is no built-in way to force the Speech Studio API to transcribe numbers in a specific way. However, there are a few workarounds that you can use.

One workaround is to use a custom pronunciation dictionary. A custom pronunciation dictionary is a text file that contains a list of words and their pronunciations. You can create a custom pronunciation dictionary to teach the Speech Studio API how to pronounce numbers in the desired way.

To create a custom pronunciation dictionary, you will need to create a text file with the following format:

"word" = "pronunciation"

For example, to teach the Speech Studio API how to pronounce the number £2,316 as "two thousand three hundred and sixteen pounds", you would add the following line to your custom pronunciation dictionary:

"£2,316" = "two thousand three hundred and sixteen pounds"

Once you have created your custom pronunciation dictionary, you can upload it to the Speech Studio portal. To do this, go to the Customization page in the Speech Studio portal and click on the Upload pronunciation dictionary button.

Once your custom pronunciation dictionary has been uploaded, you can select it when you create a new speech recognition endpoint. This will tell the Speech Studio API to use your custom pronunciation dictionary when transcribing audio.

Another workaround is to use a post-processing step to correct the transcription. After the Speech Studio API has transcribed the audio, you can use a post-processing step to search for numbers and replace them with the desired format.

For example, you could use a regular expression to search for numbers in the transcription and replace them with the following format:

 pounds

This would replace all numbers in the transcription with the word "pounds" after them.

I hope this helps! Let me know if you have any other questions.

Answer

Hi @George Kent ,

Thanks for using Microsoft Q&A platform.

Certainly, I understand your concerns about adding every possible number to the pronunciation dictionary and using post-processing as a last resort. Here are some alternative solutions, that you can consider for your application, as I assume "n" number of currency mappings that needs to be handled in future perspective, your app being a financial app:

Dynamic data structure: Instead of adding every possible number to the pronunciation dictionary, you can make your data structs dynamic. In this approach, your app could generate pronunciations on the fly based on the user's input. For example, when a user enters a number in a specific format, your app could convert it to words and then use those words for pronunciation. This way, you don't need to predefine all possible numbers, and the pronunciation can be tailored to the user's input.

User Configuration: Allow users to configure how they want numbers to be transcribed. Provide an option in your app's settings where users can select their preferred number transcription format. This way, users can choose between different formats (e.g., "two thousand three hundred and sixteen pounds" vs. "£2,316") according to their preferences.

Feedback Mechanism: Implement a feedback mechanism in your app where users can report transcription issues. If a user encounters an incorrect transcription of a number, they can provide feedback, and you can use this feedback to improve the transcription for future users. Over time, your app's transcription accuracy will improve based on user input.

These alternatives aim to provide a more user-centric approach to handling number transcriptions, reducing the need for extensive predefined dictionaries, code(file) maintenance, change requests to other teams whenever a modification is required in the code/config files etc. or post-processing. This also allows users to have control over how numbers are transcribed in your app.

A dynamic pronunciation data structure is indeed a more flexible and scalable approach in such cases.

To implement a dynamic data structure without needing to specify every possible number in advance, you can use a Python library like inflect or a similar library and indeed it is possible to programmatically populate this data structure in Python. You can dynamically add, modify, or remove key-value pairs within this data structure during runtime based on your application's logic. This can be more flexible to convert numbers to their spoken word form on-the-fly.

Here's how you can achieve this:

1. Initialize an empty data structure (currency_data_structure).

2.Programmatically add, modify, or remove key-value pairs as needed.

3. Access values using keys, check for key existence, and iterate through the data structure.

You can apply similar principles within your application to dynamically populate a dictionary with currency mappings based on your requirements. This allows you to handle multiple currencies without the need for code edits or external configuration files.

Regards,
Srinivas.

-Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, Thanks.

Share via

How to transcribe pounds as currency

2 answers

Your answer