By default, the Azure Speech SDK ends a transcription session when:
- It detects prolonged silence.
- It receives an end-of-stream signal from the audio input.
This behavior is built-in for efficiency but can be overridden.
How to prevent auto-ending due to silence
You can adjust or disable silence detection using SDK properties:
- Increase silence timeout: speechconfig.setproperty( speechsdk.PropertyId.SpeechServiceConnection_InitialSilenceTimeoutMs, "60000" # e.g., 60 seconds) Reference -https://learn.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech.propertyid?view=azure-dotnet
- Disable end-of-stream auto-stop: Currently, there’s no direct “disable” flag, but you can keep the audio stream open and avoid calling
push_stream.close()until you want to stop.
- Disable end-of-stream auto-stop: Currently, there’s no direct “disable” flag, but you can keep the audio stream open and avoid calling
Manual session control
Use these methods:
- Start transcription: await conversationtranscriber.starttranscribing_async()
- Stop transcription manually: await conversationtranscriber.stoptranscribing_async()
Attach event handlers (session_stopped, canceled) to monitor state.
Reconnection / Reuse
- Best practice: Create a new
ConversationTranscriberinstance for each session.\ Reusing the same object afterstop_transcribing_async()is not recommended because internal state may not reset cleanly. - You can reuse the same
SpeechConfigand AudioConfig objects across sessions.
Session management pattern
For multiple sessions in one process:
- Keep a single
SpeechConfig(with subscription key and region). - For each new session:
- Create a new
ConversationTranscriber.- Call
start_transcribing_async()andstop_transcribing_async()as needed.
- Call
- Create a new
- For better accuracy and speaker diarization:
- Use uncompressed PCM audio (16 kHz, 16-bit mono).
- Configure
speech_config.set_property_by_name("maxSpeakerCount", "8")if needed. - Consider custom models for speaker separation.
I am facing some issue testing the sample code,
import azure.cognitiveservices.speech as speechsdk
# 1. Configure Speech Service
speech_key = "<YOUR_SUBSCRIPTION_KEY>"
service_region = "<YOUR_REGION>"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
# Configure silence timeout (e.g., 60 seconds)
speech_config.set_property(
speechsdk.PropertyId.SpeechServiceConnection_InitialSilenceTimeoutMs,
"60000" # 60,000 ms = 60 seconds
)
# Optional: Configure max speaker count for diarization
speech_config.set_property_by_name("maxSpeakerCount", "8")
# 2. Create AudioConfig (microphone or custom stream)
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
# Function to create a new ConversationTranscriber
def create_transcriber():
return speechsdk.conversation.ConversationTranscriber(
speech_config=speech_config,
audio_config=audio_config
)
# 3. Start and Stop Transcription
async def run_transcription():
# Create a new transcriber instance
transcriber = create_transcriber()
# Attach event handlers
transcriber.transcribed.connect(lambda evt: print(f"TRANSCRIBED: {evt.result.text}"))
transcriber.session_stopped.connect(lambda evt: print("Session stopped"))
transcriber.canceled.connect(lambda evt: print(f"Canceled: {evt.reason}"))
# Start transcription
print("Starting transcription...")
await transcriber.start_transcribing_async()
# Simulate some work (e.g., wait for audio input)
await asyncio.sleep(30) # Replace with your logic
# Stop transcription manually
print("Stopping transcription...")
await transcriber.stop_transcribing_async()
# Recreate transcriber for next session
print("Reconnecting...")
transcriber = create_transcriber()
await transcriber.start_transcribing_async()
# Repeat as needed
# Run the async function
import asyncio
asyncio.run(run_transcription())
please refer this sample code for modification
Hope it helps.
Thank you