How to config disfluency removal using REST API

Question

I am using the speech-to-text REST API (python) to do some research regarding fillers, pauses, and backtracking in Japanese (ja-JP).

Can I config disfluency removal while using the Speech-to-text service? I need to have true text with all the fillers in the transcribed text.

Azure STT service currently automatically removes filler words, but I need to have those filler words such as "eto" or "ano" after I do STT.

Are there any options to allow me to include those filler words when displaying text?

Thank you in advance.

Answer

Yes, you can configure the disfluency removal feature in the Azure Speech-to-Text (STT) service by using the appropriate parameters in the REST API request. To preserve the fillers in the transcribed text, you need to set the "DiarizationEnabled" parameter to "true" and the "ProfanityFilterMode" parameter to "Masked".

Here's an example of how you can configure the REST API request to enable disfluency removal while preserving the filler words:

import requests
import json

# Set the endpoint URL
url = 'https://.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=ja-JP'

# Set the request headers
headers = {
    'Content-Type': 'audio/wav',
    'Ocp-Apim-Subscription-Key': '',
}

# Set the request parameters
params = {
    'DiarizationEnabled': 'true',
    'ProfanityFilterMode': 'Masked',
}

# Read the audio file as binary data
with open('audio_file.wav', 'rb') as file:
    data = file.read()

# Send the REST API request
response = requests.post(url, headers=headers, params=params, data=data)

# Parse the JSON response
result = json.loads(response.content)

# Get the transcribed text with the fillers
text = result['DisplayText']
print(text)

In the above example, the "DiarizationEnabled" parameter is set to "true" to enable speaker diarization, which helps to identify and preserve the fillers in the transcribed text. The "ProfanityFilterMode" parameter is set to "Masked" to avoid removing the fillers as profanity.

Note that the disfluency removal feature in the Azure STT service is not perfect and may still remove some of the fillers in the transcribed text. However, enabling speaker diarization and avoiding profanity filtering can help to preserve most of the fillers in the text.

How to config disfluency removal using REST API

1 answer