Yes, you can configure the disfluency removal feature in the Azure Speech-to-Text (STT) service by using the appropriate parameters in the REST API request. To preserve the fillers in the transcribed text, you need to set the "DiarizationEnabled" parameter to "true" and the "ProfanityFilterMode" parameter to "Masked".
Here's an example of how you can configure the REST API request to enable disfluency removal while preserving the filler words:
import requests
import json
# Set the endpoint URL
url = 'https://<your-service-region>.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=ja-JP'
# Set the request headers
headers = {
'Content-Type': 'audio/wav',
'Ocp-Apim-Subscription-Key': '<your-subscription-key>',
}
# Set the request parameters
params = {
'DiarizationEnabled': 'true',
'ProfanityFilterMode': 'Masked',
}
# Read the audio file as binary data
with open('audio_file.wav', 'rb') as file:
data = file.read()
# Send the REST API request
response = requests.post(url, headers=headers, params=params, data=data)
# Parse the JSON response
result = json.loads(response.content)
# Get the transcribed text with the fillers
text = result['DisplayText']
print(text)
In the above example, the "DiarizationEnabled" parameter is set to "true" to enable speaker diarization, which helps to identify and preserve the fillers in the transcribed text. The "ProfanityFilterMode" parameter is set to "Masked" to avoid removing the fillers as profanity.
Note that the disfluency removal feature in the Azure STT service is not perfect and may still remove some of the fillers in the transcribed text. However, enabling speaker diarization and avoiding profanity filtering can help to preserve most of the fillers in the text.