What's new in Azure AI Speech?

Чланак
10/09/2024

Azure AI Speech is updated on an ongoing basis. To stay up-to-date with recent developments, this article provides you with information about new releases and features.

Recent highlights

Azure AI Speech Toolkit extension is now available for Visual Studio Code users. It contains a list of speech quick-starts and scenario samples that can be easily built and run with simple clicks. For more information, see Azure AI Speech Toolkit in Visual Studio Code Marketplace.
Azure AI speech high definition (HD) voices are available in public preview. The HD voices can understand the content, automatically detect emotions in the input text, and adjust the speaking tone in real-time to match the sentiment. For more information, see What are Azure AI Speech high definition (HD) voices?.
Fast transcription is now available in public preview. It can transcribe audio much faster than the actual audio length. For more information, see the fast transcription API guide.
Video translation is now available in the Azure AI Speech service. For more information, see What is video translation?.
The Azure AI Speech service supports OpenAI text to speech voices. For more information, see What are OpenAI text to speech voices?.
The custom voice API is available for creating and managing professional and personal custom neural voice models.

Release notes

Choose a service or resource

2024-November release

Azure AI Speech Toolkit extension is now available for Visual Studio Code users. It contains a list of speech quick-starts and scenario samples that can be easily built and run with simple clicks. For more information, see Azure AI Speech Toolkit in Visual Studio Code Marketplace.

Speech SDK 1.41.1: 2024-October release

New Features

Added support for Amazon Linux 2023 and Azure Linux 3.0.
Added public property id SpeechServiceConnection_ProxyHostBypass to specify hosts for which proxy is not used.
Added properties to control new phrase segmentation strategies.

Bug Fixes

Fixed incomplete support for keyword recognition Advanced models produced after August 2024.
- https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/2564
- https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/2571
- https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/2590
- Note that with Swift on iOS your project must use either MicrosoftCognitiveServicesSpeech-EmbeddedXCFramework-1.41.1.zip (from https://aka.ms/csspeech/iosbinaryembedded) or the MicrosoftCognitiveServicesSpeechEmbedded-iOS pod that include the Advanced model support.
Fixed a memory leak in C# related to string usage.
Fixed not being able to get SPXAutoDetectSourceLanguageResult from SPXConversationTranscriptionResult in Objective-C and Swift.
Fixed an occasional crash when using the Microsoft Audio Stack in recognition.
Fixed type hints in Python. https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/2539
Fixed not being able to fetch the list of TTS voices when using a custom endpoint.
Fixed embedded TTS re-initializing for every speak request when the voice is specified by a short name.
Fixed the API reference documentation for the max duration of RecognizeOnce audio.
Fixed error handling arbitary sampling rates in JavaScript
- Thanks to rseanhall for this contribution.
Fixed error calculating the audio offset in JavaScript
- Thanks to motamed for this contribution.

Breaking Changes

Keyword recognition support on Windows ARM 32-bit has been removed due to the required ONNX runtime not available for this platform.

Speech SDK 1.40: 2024-August release

Note

Speech SDK version 1.39.0 was an internal release and isn't missing.

New features

Added support for streaming of G.722 compressed audio in speech recognition.
Added support for pitch, rate, and volume setting in input text streaming in speech synthesis.
Added support for personal voice input text streaming by introducing PersonalVoiceSynthesisRequest in speech synthesis. This API is in preview and subject to change in future versions.
Added support for diarization of intermediate results when ConversationTranscriber is used.
Removed CentOS/RHEL 7 support due to CentOS 7 EOL and the end of RHEL 7 Maintenance Support 2.
Use of embedded speech models now requires a model license instead of a model key. If you're an existing embedded speech customer and want to upgrade, please contact your support person at Microsoft for details on model updates.

Bug fixes

Built Speech SDK binaries for Windows with the _DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR flag as mitigation for the Visual C++ runtime issue Access violation with std::mutex::lock after upgrading to VS 2022 version 17.10.0 - Developer Community (visualstudio.com). Windows C++ applications using the Speech SDK might need to apply the same build configuration flag if their code uses std::mutex (see details in the linked issue).
Fixed OpenSSL 3.x detection not working on Linux arm64 (https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/2420).
Fixed the issue that when deploying a UWP app, libraries, and model from MAS NuGet package wouldn't get copied to the deployment location.
Fixed a content provider conflict in Android packages (https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/2463).
Fixed postprocessing options not applying to intermediate speech recognition results.
Fixed .NET 8 warning about distribution specific runtime identifiers (https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/2244).

Samples

Updated embedded speech samples to use a model license instead of a key.

Speech SDK 1.38.0: 2024-June release

New features

Upgrade Speech SDK Linux platform requirements:
- The new minimum baseline is Ubuntu 20.04 LTS or compatible with glibc 2.31 or newer.
- Binaries for Linux x86 are removed in accordance with Ubuntu 20.04 platform support.
- Note that RHEL/CentOS 7 remain supported until June 30 (the end of CentOS 7 and the end of RHEL 7 Maintenance Support 2). Binaries for them will be removed in the Speech SDK 1.39.0 release.
Add support for OpenSSL 3 on Linux.
Add support for g722-16khz-64kbps audio output format with speech synthesizer.
Add support for sending messages through a connection object with speech synthesizer.
Add Start/StopKeywordRecognition APIs in Objective-C and Swift.
Add API for selecting a custom translation model category.
Update GStreamer usage with speech synthesizer.

Bug fixes

Fix "Websocket message size can't exceed 65,536 bytes" error during Start/StopKeywordRecognition.
Fix a Python segmentation fault during speech synthesis.

Samples

Update C# samples to use .NET 6.0 by default.

Speech SDK 1.37.0: 2024-April release

New features

Add support for input text streaming in speech synthesis.
Change the default speech synthesis voice to en-US-AvaMultilingualNeural.
Update Android builds to use OpenSSL 3.x.

Bug fixes

Fix occasional JVM crashes during SpeechRecognizer dispose when using MAS. (https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/2125)
Improve detection of default audio devices on Linux. (https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/2292)

Samples

Updated for new features.

Speech SDK 1.36.0: 2024-March release

New features

Add support for language identification in multi-lingual translation on v2 endpoints using AutoDetectSourceLanguageConfig::FromOpenRange().

Bug fixes

Fix SynthesisCanceled event not fired if stop is called during SynthesisStarted event.
Fix a noise issue in embedded speech synthesis.
Fix a crash in embedded speech recognition when running multiple recognizers in parallel.
Fix the phrase detection mode setting on v1/v2 endpoints.
Fixes to various issues with Microsoft Audio Stack.

Samples

Updates for new features.

Speech SDK 1.35.0: February 2024 release

New features

Change the default text to speech voice from en-US-JennyMultilingualNeural to en-US-AvaNeural.
Support word-level detail in embedded speech translation results using the detailed output format.

Bug fixes

Fix the AudioDataStream position getter API in Python.
Fix speech translation using v2 endpoints without language detection.
Fix a random crash and duplicate word boundary events in embedded text to speech.
Return a correct cancellation error code for an internal server error on WebSocket connections.
Fix the failure to load FPIEProcessor.dll library when MAS is used with C#.

Samples

Minor formatting updates for Embedded recognition samples.

Speech SDK 1.34.1: January 2024 release

Breaking changes

Bug fixes only

New features

Bug fixes only

Bug fixes

Fix regression introduced in 1.34.0 where service endpoint url was constructed with bad locale info for users in several China regions.

Speech SDK 1.34.0: November 2023 release

Breaking changes

SpeechRecognizer is updated to use a new endpoint by default (that is, when not explicitly specifying a URL) which no longer supports query string parameters for most of the properties. Instead of setting query string parameters directly with ServicePropertyChannel.UriQueryParameter, please use the corresponding API functions.

New features

Compatibility with .NET 8 (Fix for https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/2170 except for warning about centos7-x64)
Support for embedded speech performance metrics which can be used to evaluate the capability of a device to run embedded speech.
Support for source language identification in embedded multi-lingual translation.
Support for embedded speech-to-text, text to speech and translation for iOS and Swift/Objective-C released in preview.
Embedded support is provided in MicrosoftCognitiveServicesSpeechEmbedded-iOS Cocoapod.

Bug fixes

Fix for iOS SDK x2 times binary size growth · Issue #2113 · Azure-Samples/cognitive-services-speech-sdk (github.com)
Fix for Unable to get word level time stamps from Azure speech to text API · Issue #2156 · Azure-Samples/cognitive-services-speech-sdk (github.com)
Fix for DialogServiceConnector destruction phase to disconnect events correctly. This was causing crashes occasionally.
Fix for exception during creation of a recognizer when MAS is used.
FPIEProcessor.dll from Microsoft.CognitiveServices.Speech.Extension.MAS NuGet package for Windows UWP x64 and Arm64 had dependency on VC runtime libraries for native C++. The issue has been rectified by updating the dependency to correct VC runtime libraries (for UWP).
Fix for [MAS] Recurrent calls to recognizeOnceAsync lead to SPXERR_ALREADY_INITIALIZED when using MAS · Issue #2124 · Azure-Samples/cognitive-services-speech-sdk (github.com)
Fix for embedded speech recognition crash when phrase lists are used.

Samples

Embedded iOS samples for speech-to-text, text to speech and translation.

Speech CLI 1.34.0: November 2023 release

New features

Support word boundary events output when synthesizing speech.

Bug fixes

Updated JMESPath dependency to the latest release, improves string evaluations

Speech SDK 1.33.0: October 2023 release

Breaking change notice

The new NuGet package added for Microsoft Audio Stack (MAS) is now required to be included by applications that are using MAS in their package configuration files.

New features

Added the new NuGet package Microsoft.CognitiveServices.Speech.Extension.MAS.nupkg, which provides improved echo cancellation performance when using Microsoft Audio Stack
Pronunciation Assessment: added support for prosody and content evaluation, which can assess the spoken speech in terms of prosody, vocabulary, grammar, and topic.

Bug fixes

Fixed keyword recognition result offsets so that they correctly match the input audio stream since the beginning. The fix applies to both stand-alone keyword recognition and keyword-triggered speech recognition.
Fixed Synthesizer stopSpeaking doesn't return immediately SPXSpeechSynthesizer stopSpeaking() method can't return immediately on iOS 17 - Issue #2081
Fixed Mac catalyst import issue on Swift module Support for mac catalyst with apple silicon. Issue #1948
JS: AudioWorkletNode module loads now uses a trusted URL, with fallback for CDN browser includes.
JS: Packed lib files now target ES6 JS, with support for ES5 JS removed.
JS: intermediate events for translation scenario targeting v2 endpoint are correctly handled
JS: The language property for TranslationRecognitionEventArgs is now set for translation.hypothesis events.
Speech Synthesis: SynthesisCompleted event is guaranteed to be emitted after all metadata events, so it could be used to indicate to the end of events. How to detect when visemes are received completely? Issue #2093 Azure-Samples/cognitive-services-speech-sdk

Samples

Added sample to demonstrate MULAW streaming using Python)
Fix for speech-to-text NAudio sample

Speech CLI 1.33.0: October 2023 release

New features

Support word boundary events output when synthesizing speech.

Bug fixes

none

Speech SDK 1.32.1: September 2023 release

Bug fixes

Android packages updates with latest security fixes from OpenSSL1.1.1v
JS – WebWorkerLoadType property added to allow bypass of data URL load for timeout worker
JS – Fix Conversation Translation disconnect after 10 minutes
JS – Conversation Translation auth token from Conversation now propagates to Translation service connection

Samples

Conversation transcription with Swift APIs

Speech SDK 1.31.0: August 2023 release

New Features

Support for real-time diarization is available in public preview with the Speech SDK 1.31.0. This feature is available in the following SDKs: C#, C++, Java, JavaScript, Python, and Objective-C/Swift.
Synchronized speech synthesis word boundary and viseme events with audio playback

Breaking changes

The former "conversation transcription" scenario is renamed to "meeting transcription". For example, use MeetingTranscriber instead of ConversationTranscriber, and use CreateMeetingAsync instead of CreateConversationAsync. Although the names of SDK objects and methods have changed, the renaming doesn't change the feature itself. Use meeting transcription objects for transcription of meetings with user profiles and voice signatures. See Meeting transcription for more information. The "conversation translation" objects and methods aren't affected by these changes. You can still use the ConversationTranslator object and its methods for meeting translation scenarios.

For real-time diarization, a new ConversationTranscriber object is introduced. The new "conversation transcription" object model and call patterns are similar to continuous recognition with the SpeechRecognizer object. A key difference is that the ConversationTranscriber object is designed to be used in a conversation scenario where you want to differentiate multiple speakers (diarization). User profiles and voice signatures aren't applicable. See the real-time diarization quickstart for more information.

This table shows the previous and new object names for real-time diarization and meeting transcription. The scenario name is in the first column, the previous object names are in the second column, and the new object names are in the third column.

Scenario name	Previous object names	New object names
Real-time diarization	N/A	`ConversationTranscriber`
Meeting transcription	`ConversationTranscriber` `ConversationTranscriptionEventArgs` `ConversationTranscriptionCanceledEventArgs` `ConversationTranscriptionResult` `RemoteConversationTranscriptionResult` `RemoteConversationTranscriptionClient` `RemoteConversationTranscriptionResult` `Participant`¹ `ParticipantChangedReason`¹ `User`¹	`MeetingTranscriber` `MeetingTranscriptionEventArgs` `MeetingTranscriptionCanceledEventArgs` `MeetingTranscriptionResult` `RemoteMeetingTranscriptionResult` `RemoteMeetingTranscriptionClient` `RemoteMeetingTranscriptionResult` `Participant` `ParticipantChangedReason` `User` `Meeting`²

¹ The Participant, ParticipantChangedReason, and User objects are applicable to both meeting transcription and meeting translation scenarios.

² The Meeting object is new and is used with the MeetingTranscriber object.

Bug fixes

Fixed macOS minimum supported version https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/2017
Fixed Pronunciation Assessment bug:
- Addressed phoneme accuracy scores issue, ensuring they now accurately reflect only the specific mispronounced phoneme. https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/1917
- Resolved an issue where the Pronunciation Assessment feature was inaccurately identifying entirely correct pronunciations as erroneous, particularly in situations where words could have multiple valid pronunciations. https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/1530

Samples

Speech SDK 1.30.0: July 2023 release

New Features

C++, C#, Java - Added support for DisplayWords in Embedded Speech Recognition's detailed result.
Objective-C/Swift - Added support for ConnectionMessageReceived event in Objective-C/Swift.
Objective-C/Swift - Improved keyword-spotting models for iOS. This change has increased the size of certain packages, which contain iOS binaries (like NuGet, XCFramework). We're working to reduce the size for future releases.

Bug fixes

Fixed a memory leak when using speech recognizer with PhraseListGrammar, as reported by a customer (GitHub issue).
Fixed a deadlock in text to speech open connection API.

More notes

Java - Some internally used, public Java API methods were changed to package internal, protected or private. This change shouldn't have an effect on developers, as we don't expect applications to be using those. Noted here for transparency.

Samples

New Pronunciation Assessment samples on how to specify a learning language in your own application
- C#: See sample code.
- C++: See sample code.
- JavaScript: See sample code.
- Objective-C: See sample code.
- Python: See sample code.
- Swift: See sample code.

Speech SDK 1.29.0: June 2023 release

New Features

C++, C#, Java - Preview of Embedded Speech Translation APIs. Now you can do speech translation without cloud connection!
JavaScript - Continuous Language Identification (LID) now enabled for speech translation.
JavaScript - Community contribution for adding LocaleName property to VoiceInfo class. Thank you GitHub user shivsarthak for the pull request.
C++, C#, Java - Added support for resampling Embedded text to speech output from 16 kHz to 48 kHz sample rate.
Added support for hi-IN locale in Intent Recognizer with Simple Pattern Matching.

Bug fixes

Fixed a crash caused by a race condition in Speech Recognizer during object destruction, as seen in some of our Android tests
Fixed possible deadlocks in Intent Recognizer with Simple Pattern Matcher

Samples

New Embedded Speech Translation samples

Speech SDK 1.28.0: May 2023 release

Breaking change

JavaScript SDK: Online Certificate Status Protocol (OCSP) was removed. This allows clients to better conform to browser and Node standards for certificate handling. Version 1.28 and onward will no longer include our custom OCSP module.

New Features

Embedded Speech Recognition now returns NoMatchReason::EndSilenceTimeout when a silence timeout occurs at the end of an utterance. This matches the behavior when doing recognition using the real-time speech service.
JavaScript SDK: Set properties on SpeechTranslationConfig using PropertyId enum values.

Bug fixes

C# on Windows - Fix potential race condition/deadlock in Windows audio extension. In scenarios that both dispose of the audio renderer quickly and also use the Synthesizer method to stop speaking, the underlying event wasn't reset by stop, and could cause the renderer object to never be disposed, all while it could be holding a global lock for disposal, freezing the dotnet GC thread.

Samples

Added an embedded speech sample for MAUI.
Updated the embedded speech sample for Android Java to include text to speech.

Speech SDK 1.27.0: April 2023 release

Notification about upcoming changes

We plan to remove Online Certificate Status Protocol (OCSP) in the next JavaScript SDK release. This allows clients to better conform to browser and Node standards for certificate handling. Version 1.27 is the last release that includes our custom OCSP module.

New Features

JavaScript – Added support for microphone input from the browser with Speaker Identification and Verification.
Embedded Speech Recognition - Update support for PropertyId::Speech_SegmentationSilenceTimeoutMs setting.

Bug fixes

General - Reliability updates in service reconnection logic (all programming languages except JavaScript).
General - Fix string conversions leaking memory on Windows (all relevant programming languages except JavaScript).
Embedded Speech Recognition - Fix crash in French Speech Recognition when using certain grammar list entries.
Source code documentation - Corrections to SDK reference documentation comments related to audio logging on the service.
Intent recognition - Fix Pattern Matcher priorities related to list entities.

Samples

Properly handle authentication failure in C# Conversation Transcription (CTS) sample.
Added example of streaming pronunciation assessment for Python, JavaScript, Objective-C and Swift.

Speech SDK 1.26.0: March 2023 release

Breaking changes

Bitcode has been disabled in all iOS targets in the following packages: Cocoapod with xcframework, NuGet (for Xamarin and MAUI) and Unity. The change is due to Apple's deprecation of bitcode support from Xcode 14 and onwards. This change also means if you're using Xcode 13 version or you have explicitly enabled the bitcode on your application using the Speech SDK, you might encounter an error saying "framework doesn't contain bitcode and you must rebuild it". To resolve this issue, make sure your targets have bitcode disabled.
Minimum iOS deployment target is upgraded to 11.0 in this release, which means armv7 HW is no longer supported.

New features

Embedded (on-device) Speech Recognition now supports both 8 and 16-kHz sampling rate input audio (16-bit per sample, mono PCM).
Speech Synthesis now reports connection, network, and service latencies in the result to help end-to-end latency optimization.
New tie breaking rules for Intent Recognition with simple pattern matching. The more character bytes that are matched, will win over pattern matches with lower character byte count. Example: Pattern "Select {something} in the top right" will win over "Select {something}"

Bug fixes

Speech Synthesis: fix a bug where the emoji isn't correct in word boundary events.
Intent Recognition with Conversational Language Understanding (CLU):
- Intents from the CLU Orchestrator Workflow now appear correctly.
- The JSON result is now available via the property ID LanguageUnderstandingServiceResponse_JsonResult.
Speech recognition with keyword activation: Fix for missing ~150 ms audio after a keyword recognition.
Fix for Speech SDK NuGet iOS MAUI Release build, reported by customer (GitHub issue)

Samples

Fix for Swift iOS sample, reported by customer (GitHub issue)

Speech SDK 1.25.0: January 2023 release

Breaking changes

Language Identification (preview) APIs have been simplified. If you update to Speech SDK 1.25 and see a build break, please visit the Language Identification page to learn about the new property SpeechServiceConnection_LanguageIdMode. This single property replaces the two previous ones SpeechServiceConnection_SingleLanguageIdPriority and SpeechServiceConnection_ContinuousLanguageIdPriority. Prioritizing between low latency and high accuracy is no longer necessary following recent model improvements. Now, you only need to select whether to run at-start or continuous Language Identification when doing continuous speech recognition or translation.

New features

C#/C++/Java: Embedded Speech SDK is now released under gated public preview. See Embedded Speech (preview) documentation. You can now do on-device speech to text and text to speech when cloud connectivity is intermittent or unavailable. Supported on Android, Linux, macOS, and Windows platforms
C# MAUI: Support added for iOS and Mac Catalyst targets in Speech SDK NuGet (Customer issue)
Unity: Android x86_64 architecture added to Unity package (Customer issue)
Go:
- ALAW/MULAW direct streaming support added for speech recognition (Customer issue)
- Added support for PhraseListGrammar. Thank you GitHub user czkoko for the community contribution!
C#/C++: Intent Recognizer now supports Conversational Language Understanding models in C++ and C# with orchestration on the Microsoft service

Bug fixes

Fix an occasional hang in KeywordRecognizer when trying to stop it
Python:
- Fix for getting Pronunciation Assessment results when PronunciationAssessmentGranularity.FullText is set (Customer issue)
- Fix for gender property for Male voices not being retrieved, when getting speech synthesis voices
JavaScript
- Fix for parsing some WAV files that were recorded on iOS devices (Customer issue)
- JS SDK now builds without using npm-force-resolutions (Customer issue)
- Conversation Translator now correctly sets service endpoint when using a speechConfig instance created using SpeechConfig.fromEndpoint()

Samples

Added samples showing how to use Embedded Speech
Added Speech to text sample for MAUI

See Speech SDK samples repository.

Speech SDK 1.24.2: November 2022 release

New features

No new features, just an embedded engine fix to support new model files.

Bug fixes

All programing languages
- Fixed an issue with encryption of embedded speech recognition models.

Speech SDK 1.24.1: November 2022 release

New features

Published packages for the Embedded Speech preview. See https://aka.ms/embedded-speech for more information.

Bug fixes

All programing languages
- Fix embedded TTS crash when voice font isn't supported
- Fix stopSpeaking() can't stop playback on Linux (#1686)
JavaScript SDK
- Fixed regression in how conversation transcriber gated audio.
Java
- Temporarily Published updated POM and Javadocs files to Maven Central to enable the docs pipeline to update online reference docs.
Python
- Fix regression where Python speak_text(ssml) returns void.

Speech SDK 1.24.0: October 2022 release

New features

All programing languages: AMR-WB (16khz) added to the supported list of Text to speech audio output formats
Python: Package added for Linux Arm64 for supported Linux distributions.
C#/C++/Java/Python: Support added for ALAW & MULAW direct streaming to the speech service (in addition to existing PCM stream) using AudioStreamWaveFormat.
C# MAUI: NuGet package updated to support Android targets for .NET MAUI developers (Customer issue)
Mac: Added separate XCframework for Mac, which doesn't contain any iOS binaries. This offers an option for developers who need only Mac binaries using a smaller XCframework package.
Microsoft Audio Stack (MAS):
- When beam-forming angles are specified, sound originating outside of specified range will be suppressed better.
- Approximately 70% reduction in the size of libMicrosoft.CognitiveServices.Speech.extension.mas.so for Linux ARM32 and Linux Arm64.
Intent Recognition using pattern matching:
- Add orthography support for the languages fr, de, es, jp
- Added prebuilt integer support for language es.

Bug fixes

iOS: fix speech synthesis error on iOS 16 caused by compressed audio decoding failure (Customer Issue).
JavaScript:
- Fix authentication token not working when getting speech synthesis voice list (Customer issue).
- Use data URL for worker loading (Customer issue).
- Create audio processor worklet only when AudioWorklet is supported in browser (Customer issue). This was a community contribution by William Wong. Thank you William!
- Fix recognized callback when LUIS response connectionMessage is empty (Customer issue).
- Properly set speech segmentation timeout.
Intent Recognition using pattern matching:
- Non-json characters inside models now loads properly.
- Fix hanging issue when recognizeOnceAsync(text) was called during continuous recognition.

Speech SDK 1.23.0: July 2022 release

New features

C#, C++, Java: Added support for languages zh-cn and zh-hk in Intent Recognition with Pattern Matching.
C#: Added support for AnyCPU .NET Framework builds

Bug fixes

Android: Fixed OpenSSL vulnerability CVE-2022-2068 by updating OpenSSL to 1.1.1q
Python: Fix crash when using PushAudioInputStream
iOS: Fix "EXC_BAD_ACCESS: Attempted to dereference null pointer" as reported on iOS (GitHub issue)

Speech SDK 1.22.0: June 2022 release

New features

Java: IntentRecognitionResult API for getEntities(), applyLanguageModels(), and recognizeOnceAsync(text) added to support the "simple pattern matching" engine.
Unity: Added support for Mac M1 (Apple Silicon) for Unity package (GitHub issue)
C#: Added support for x86_64 for Xamarin Android (GitHub issue)
C#: .NET framework minimum version updated to v4.6.2 for SDK C# package as v4.6.1 has retired (see Microsoft .NET Framework Component Lifecycle Policy)
Linux: Added support for Debian 11 and Ubuntu 22.04 LTS. Ubuntu 22.04 LTS requires manual installation of libssl1.1 either as a binary package from here (for example, libssl1.1_1.1.1l-1ubuntu1.3_amd64.deb or newer for x64), or by compiling from sources.

Bug fixes

UWP: OpenSSL dependency removed from UWP libraries and replaced with WinRT websocket and HTTP APIs to meet security compliance and smaller binary footprint.
Mac: Fixed "MicrosoftCognitiveServicesSpeech Module Not Found" issue when using Swift projects targeting macOS platform
Windows, Mac: Fixed a platform-specific issue where audio sources that were configured via properties to stream at a real-time rate sometimes fell behind and eventually exceeded capacity

Samples (GitHub)

C#: .NET framework samples updated to use v4.6.2
Unity: Virtual-assistant sample fixed for Android and UWP
Unity: Unity samples updated for Unity 2020 LTS version

Speech SDK 1.21.0: April 2022 release

New features

Java & JavaScript: Added support for Continuous Language Identification when using the SpeechRecognizer object
JavaScript: Added Diagnostics APIs to enable console logging level and (Node only) file logging, to help Microsoft troubleshoot customer-reported issues
Python: Added support for Conversation Transcription
Go: Added support for Speaker Recognition
C++ & C#: Added support for a required group of words in the Intent Recognizer (simple pattern matching). For example: "(set|start|begin) a timer" where either "set", "start" or "begin" must be present for the intent to be recognized.
All programming languages, Speech Synthesis: Added duration property in word boundary events. Added support for punctuation boundary and sentence boundary
Objective-C/Swift/Java: Added word-level results on the Pronunciation Assessment result object (similar to C#). The application no longer needs to parse a JSON result string to get word-level information (GitHub issue)
iOS platform: Added experimental support for ARMv7 architecture

Bug fixes

iOS platform: Fix to allow building for the target "Any iOS Device", when using CocoaPod (GitHub issue)
Android platform: OpenSSL version has been updated to 1.1.1n to fix security vulnerability CVE-2022-0778
JavaScript: Fix issue where wav header wasn't updated with file size (GitHub issue)
JavaScript: Fix request ID desync issue breaking translation scenarios (GitHub issue)
JavaScript: Fix issue when instantiating SpeakerAudioDestination with no stream (GitHub issue]
C++: Fix C++ headers to remove a warning when compiling for C++17 or newer

Samples GitHub

New Java samples for Speech Recognition with Language Identification
New Python and Java samples for Conversation Transcription
New Go sample for Speaker Recognition
New C++ and C# tool for Windows that enumerates all audio capture and render devices, for finding their Device ID. This ID is needed by the Speech SDK if you plan to capture audio from, or render audio to, a nondefault device.

Speech SDK 1.20.0: January 2022 release

New features

Objective-C, Swift, and Python: Added support for DialogServiceConnector, used for Voice-Assistant scenarios.
Python: Support for Python 3.10 was added. Support for Python 3.6 was removed, per Python's end-of-life for 3.6.
Unity: Speech SDK is now supported for Unity applications on Linux.
C++, C#: IntentRecognizer using pattern matching is now supported in C#. In addition, scenarios with custom entities, optional groups, and entity roles are now supported in C++ and C#.
C++, C#: Improved diagnostics trace logging using new classes FileLogger, MemoryLogger, and EventLogger. SDK logs are an important tool for Microsoft to diagnose customer-reported issues. These new classes make it easier for customers to integrate Speech SDK logs into their own logging system.
All programming languages: PronunciationAssessmentConfig now has properties to set the desired phoneme alphabet (IPA or SAPI) and N-Best Phoneme Count (avoiding the need to author a configuration JSON as per GitHub issue 1284). Also, syllable level output is now supported.
Android, iOS, and macOS (all programming languages): GStreamer is no longer needed to support limited-bandwidth networks. SpeechSynthesizer now uses the operating system's audio decoding capabilities to decode compressed audio streamed from the text to speech service.
All programming languages: SpeechSynthesizer now supports three new raw output Opus formats (without container), which are widely used in live streaming scenarios.
JavaScript: Added getVoicesAsync() API to SpeechSynthesizer to retrieve the list of supported synthesis voices (GitHub issue 1350)
JavaScript: Added getWaveFormat() API to AudioStreamFormat to support non-PCM wave formats (GitHub issue 452)
JavaScript: Added volume getter/setter and mute()/unmute() APIs to SpeakerAudioDestination (GitHub issue 463)

Bug fixes

C++, C#, Java, JavaScript, Objective-C, and Swift: Fix to remove a 10-second delay while stopping a speech recognizer that uses a PushAudioInputStream. This is for the case where no new audio is pushed in after StopContinuousRecognition is called (GitHub issues 1318, 331)
Unity on Android and UWP: Unity meta files were fixed for UWP, Android Arm64, and Windows Subsystem for Android (WSA) Arm64 (GitHub issue 1360)
iOS: Compiling your Speech SDK application on any iOS Device when using CocoaPods is now fixed (GitHub issue 1320)
iOS: When SpeechSynthesizer is configured to output audio directly to a speaker, playback stopped at the beginning in rare conditions. This was fixed.
JavaScript: Use script processor fallback for microphone input if no audio worklet is found (GitHub issue 455)
JavaScript: Add protocol to agent to mitigate bug found with Sentry integration (GitHub issue 465)

Samples GitHub

C++, C#, Python, and Java samples showing how to get detailed recognition results. The details include alternative recognition results, confidence score, Lexical form, Normalized form, Masked Normalized form, with word-level timing for each.
iOS sample added using AVFoundation as external audio source.
Java sample added to show how to get SRT (SubRip Text) format using WordBoundary event.
Android samples for Pronunciation Assessment.
C++, C# showing usage of the new Diagnostics Logging classes.

Speech SDK 1.19.0: 2021-Nov release

Highlights

Speaker Recognition service is generally available (GA) now. Speech SDK APIs are available on C++, C#, Java, and JavaScript. With Speaker Recognition, you can accurately verify and identify speakers by their unique voice characteristics. For more information about this topic, see the documentation.
We've dropped support for Ubuntu 16.04 in conjunction with Azure DevOps and GitHub. Ubuntu 16.04 reached end of life back in April of 2021. Migrate your Ubuntu 16.04 workflows to Ubuntu 18.04 or newer.
OpenSSL linking in Linux binaries changed to dynamic. Linux binary size has been reduced by about 50%.
Mac M1 ARM-based silicon support added.

New features

C++/C#/Java: New APIs added to enable audio processing support for speech input with Microsoft Audio Stack. Documentation here.
C++: New APIs for intent recognition to facilitate more advanced pattern matching. This includes List and Prebuilt Integer entities as well as support for grouping intents and entities as models (Documentation, updates, and samples are under development and will be published in the near future).
Mac: Support for Arm64 (M1) based silicon for CocoaPod, Python, Java, and NuGet packages related to GitHub issue 1244.
iOS/Mac: iOS and macOS binaries are now packaged into xcframework related to GitHub issue 919.
iOS/Mac: Support for Mac catalyst related to GitHub issue 1171.
Linux: New tar package added for CentOS7 About the Speech SDK. The Linux .tar package now contains specific libraries for RHEL/CentOS 7 in lib/centos7-x64. Speech SDK libraries in lib/x64 are still applicable for all the other supported Linux x64 distributions (including RHEL/CentOS 8) and won't work on RHEL/CentOS 7.
JavaScript: VoiceProfile & SpeakerRecognizer APIs made async/awaitable.
JavaScript: Support added for US government Azure regions.
Windows: Support added for playback on Universal Windows Platform (UWP).

Bug fixes

Android: OpenSSL security update (updated to version 1.1.1l) for Android packages.
Python: Resolved bug where selecting speaker device on Python fails.
Core: Automatically reconnect when a connection attempt fails.
iOS: Audio compression disabled on iOS packages due instability and bitcode build problems when using GStreamer. Details are available via GitHub issue 1209.

Samples GitHub

Mac/iOS: Updated samples and quickstarts to use xcframework package.
.NET: Samples updated to use .NET core 3.1 version.
JavaScript: Added sample for Voice Assistants.

Speech SDK 1.18.0: 2021-July release

Note: Get started with the Speech SDK here.

Highlights summary

Ubuntu 16.04 reached end of life in April of 2021. With Azure DevOps and GitHub, we'll drop support for 16.04 in September 2021. Migrate ubuntu-16.04 workflows to ubuntu-18.04 or newer before then.

New features

C++: Simple Language Pattern matching with the Intent Recognizer now makes it easier to implement simple intent recognition scenarios.
C++/C#/Java: We added a new API, GetActivationPhrasesAsync() to the VoiceProfileClient class for receiving a list of valid activation phrases in Speaker Recognition enrollment phase for independent recognition scenarios.
- Important: The Speaker Recognition feature is in Preview. All voice profiles created in Preview will be discontinued 90 days after the Speaker Recognition feature is moved out of Preview into General Availability. At that point the Preview voice profiles will stop functioning.
Python: Added support for continuous Language Identification (LID) on the existing SpeechRecognizer and TranslationRecognizer objects.
Python: Added a new Python object named SourceLanguageRecognizer to do one-time or continuous LID (without recognition or translation).
JavaScript: getActivationPhrasesAsync API added to VoiceProfileClient class for receiving a list of valid activation phrases in Speaker Recognition enrollment phase for independent recognition scenarios.
JavaScript VoiceProfileClient's enrollProfileAsync API is now async awaitable. See this independent identification code, for example, usage.

Improvements

Java: AutoCloseable support added to many Java objects. Now the try-with-resources model is supported to release resources. See this sample that uses try-with-resources. Also see the Oracle Java documentation tutorial for The try-with-resources Statement to learn about this pattern.
Disk footprint has been significantly reduced for many platforms and architectures. Examples for the Microsoft.CognitiveServices.Speech.core binary: x64 Linux is 475KB smaller (8.0% reduction); Arm64 Windows UWP is 464KB smaller (11.5% reduction); x86 Windows is 343KB smaller (17.5% reduction); and x64 Windows is 451KB smaller (19.4% reduction).

Bug fixes

Java: Fixed synthesis error when the synthesis text contains surrogate characters. Details here.
JavaScript: Browser microphone audio processing now uses AudioWorkletNode instead of deprecated ScriptProcessorNode. Details here.
JavaScript: Correctly keep conversations alive during long running conversation translation scenarios. Details here.
JavaScript: Fixed issue with recognizer reconnecting to a mediastream in continuous recognition. Details here.
JavaScript: Fixed issue with recognizer reconnecting to a pushStream in continuous recognition. Details here.
JavaScript: Corrected word level offset calculation in detailed recognition results. Details here.

Samples

Java quickstart samples updated here.
JavaScript Speaker Recognition samples updated to show new usage of enrollProfileAsync(). See samples here.

Speech SDK 1.17.0: 2021-May release

Note

Get started with the Speech SDK here.

Highlights summary

Smaller footprint - we continue to decrease the memory and disk footprint of the Speech SDK and its components.
A new stand-alone Language Identification API allows you to recognize what language is being spoken.
Develop speech enabled mixed reality and gaming applications using Unity on macOS.
You can now use Text to speech in addition to speech recognition from the Go programming language.
Several Bug fixes to address issues YOU, our valued customers, have flagged on GitHub! THANK YOU! Keep the feedback coming!

New features

C++/C#: New stand-alone At-Start and Continuous Language Detection via the SourceLanguageRecognizer API. If you only want to detect the language(s) spoken in audio content, this is the API to do that. See details for C++ and C#.
C++/C#: Speech Recognition and Translation Recognition now support both at-start and continuous Language Identification so you can programmatically determine which language(s) are being spoken before they're transcribed or translated. See documentation here for Speech Recognition and here for Speech Translation.
C#: Added support Unity support to macOS (x64). This unlocks speech recognition and speech synthesis use cases in mixed reality and gaming!
Go: We added support for speech synthesis text to speech to the Go programming language to make speech synthesis available in even more use cases. See our quickstart or our reference documentation.
C++/C#/Java/Python/Objective-C/Go: The speech synthesizer now supports the connection object. This helps you manage and monitor the connection to the Speech service, and is especially helpful to pre-connect to reduce latency. See documentation here.
C++/C#/Java/Python/Objective-C/Go: We now expose the latency and underrun time in SpeechSynthesisResult to help you monitor and diagnose speech synthesis latency issues. See details for C++, C#, Java, Python, Objective-C and Go.
C++/C#/Java/Python/Objective-C: Text to speech now uses neural voices by default when you don't specify a voice to be used. This gives you higher fidelity output by default, but also increases the default price. You can specify any of our over 70 standard voices or over 130 neural voices to change the default.
C++/C#/Java/Python/Objective-C/Go: We added a Gender property to the synthesis voice info to make it easier to select voices based on gender. This addresses GitHub issue #1055.
C++, C#, Java, JavaScript: We now support retrieveEnrollmentResultAsync, getAuthorizationPhrasesAsync, and getAllProfilesAsync() in Speaker Recognition to ease user management of all voice profiles for a given account. See documentation for C++, C#, Java, JavaScript. This addresses GitHub issue #338.
JavaScript: We added retry for connection failures that will make your JavaScript-based speech applications more robust.

Improvements

Linux and Android Speech SDK binaries have been updated to use the latest version of OpenSSL (1.1.1k)
Code Size improvements:
- Language Understanding is now split into a separate "lu" library.
- Windows x64 core binary size decreased by 14.4%.
- Android Arm64 core binary size decreased by 13.7%.
- other components also decreased in size.

Bug fixes

All: Fixed GitHub issue #842 for ServiceTimeout. You can now transcribe long audio files using the Speech SDK without the connection to the service terminating with this error. However, we still recommend you use batch transcription for long files.
C#: Fixed GitHub issue #947 where no speech input could leave your app in a bad state.
Java: Fixed GitHub Issue #997 where the Speech SDK for Java 1.16 crashes when using DialogServiceConnector without a network connection or an invalid subscription key.
Fixed a crash when abruptly stopping speech recognition (for example, using CTRL+C on console app).
Java: Added a fix to delete temporary files on Windows when using Speech SDK for Java.
Java: Fixed GitHub issue #994 where calling DialogServiceConnector.stopListeningAsync could result in an error.
Java: Fixed a customer issue in the virtual assistant quickstart.
JavaScript: Fixed GitHub issue #366 where ConversationTranslator threw an error 'this.cancelSpeech isn't a function'.
JavaScript: Fixed GitHub issue #298 where 'Get result as an in-memory stream' sample played sound out loud.
JavaScript: Fixed GitHub issue #350 where calling AudioConfig could result in a 'ReferenceError: MediaStream isn't defined'.
JavaScript: Fixed an UnhandledPromiseRejection warning in Node.js for long-running sessions.

Samples

Updated Unity samples documentation for macOS here.
A React Native sample for the Azure AI Speech recognition service is now available here.

Speech SDK 1.16.0: 2021-March release

Note

The Speech SDK on Windows depends on the shared Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019. Download it here.

New features

C++/C#/Java/Python: Moved to the latest version of GStreamer (1.18.3) to add support for transcribing any media format on Windows, Linux, and Android. See documentation here.
C++/C#/Java/Objective-C/Python: Added support for decoding compressed TTS/synthesized audio to the SDK. If you set output audio format to PCM and GStreamer is available on your system, the SDK will automatically request compressed audio from the service to save bandwidth and decode the audio on the client. You can set SpeechServiceConnection_SynthEnableCompressedAudioTransmission to false to disable this feature. Details for C++, C#, Java, Objective-C, Python.
JavaScript: Node.js users can now use the AudioConfig.fromWavFileInput API. This addresses GitHub issue #252.
C++/C#/Java/Objective-C/Python: Added GetVoicesAsync() method for TTS to return all available synthesis voices. Details for C++, C#, Java, Objective-C, and Python.
C++/C#/Java/JavaScript/Objective-C/Python: Added VisemeReceived event for TTS/speech synthesis to return synchronous viseme animation. See documentation here.
C++/C#/Java/JavaScript/Objective-C/Python: Added BookmarkReached event for TTS. You can set bookmarks in the input SSML and get the audio offsets for each bookmark. See documentation here.
Java: Added support for Speaker Recognition APIs. Details here.
C++/C#/Java/JavaScript/Objective-C/Python: Added two new output audio formats with WebM container for TTS (Webm16Khz16BitMonoOpus and Webm24Khz16BitMonoOpus). These are better formats for streaming audio with the Opus codec. Details for C++, C#, Java, JavaScript, Objective-C, Python.
C++/C#/Java: Added support for retrieving voice profile for Speaker Recognition scenario. Details for C++, C#, and Java.
C++/C#/Java/Objective-C/Python: Added support for separate shared library for audio microphone and speaker control. This allows the developer to use the SDK in environments that don't have required audio library dependencies.
Objective-C/Swift: Added support for module framework with umbrella header. This allows the developer to import Speech SDK as a module in iOS/Mac Objective-C/Swift apps. This addresses GitHub issue #452.
Python: Added support for Python 3.9 and dropped support for Python 3.5 per Python's end-of-life for 3.5.

Known issues

C++/C#/Java: DialogServiceConnector can't use a CustomCommandsConfig to access a Custom Commands application and will instead encounter a connection error. This can be worked around by manually adding your application ID to the request with config.SetServiceProperty("X-CommandsAppId", "your-application-id", ServicePropertyChannel.UriQueryParameter). The expected behavior of CustomCommandsConfig will be restored in the next release.

Improvements

As part of our multi-release effort to reduce the Speech SDK's memory usage and disk footprint, Android binaries are now 3% to 5% smaller.
Improved accuracy, readability, and see-also sections of our C# reference documentation here.

Bug fixes

JavaScript: Large WAV file headers are now parsed correctly (increases header slice to 512 bytes). This addresses GitHub issue #962.
JavaScript: Corrected microphone timing issue if mic stream ends before stop recognition, addressing an issue with Speech Recognition not working in Firefox.
JavaScript: We now correctly handle initialization promise when the browser forces mic off before turnOn completes.
JavaScript: We replaced URL dependency with url-parse. This addresses GitHub issue #264.
Android: Fixed callbacks not working when minifyEnabled is set to true.
C++/C#/Java/Objective-C/Python: TCP_NODELAY will be correctly set to underlying socket IO for TTS to reduce latency.
C++/C#/Java/Python/Objective-C/Go: Fixed an occasional crash when the recognizer was destroyed just after starting a recognition.
C++/C#/Java: Fixed an occasional crash in the destruction of speaker recognizer.

Samples

JavaScript: Browser samples no longer require separate JavaScript library file download.

Speech SDK 1.15.0: 2021-January release

Note

The Speech SDK on Windows depends on the shared Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019. Download it here.

Highlights summary

Smaller memory and disk footprint making the SDK more efficient.
Higher fidelity output formats available for custom-neural voice private preview.
Intent Recognizer can now get return more than the top intent, giving you the ability to make a separate assessment about your customer's intent.
Voice assistants and bots are now easier to set up, and you can make it stop listening immediately, and exercise greater control over how it responds to errors.
Improved on device performance through making compression optional.
Use the Speech SDK on Windows ARM/Arm64.
Improved low-level debugging.
Pronunciation Assessment feature is now more widely available.
Several Bug fixes to address issues YOU, our valued customers, have flagged on GitHub! THANK YOU! Keep the feedback coming!

Improvements

The Speech SDK is now more efficient and lightweight. We've started a multi-release effort to reduce the Speech SDK's memory usage and disk footprint. As a first step we made significant file size reductions in shared libraries on most platforms. Compared to the 1.14 release:
- 64-bit UWP-compatible Windows libraries are about 30% smaller.
- 32-bit Windows libraries aren't yet seeing a size improvement.
- Linux libraries are 20-25% smaller.
- Android libraries are 3-5% smaller.

New features

All: New 48 KHz output formats available for the private preview of custom-neural voice through the TTS speech synthesis API: Audio48Khz192KBitRateMonoMp3, audio-48khz-192kbitrate-mono-mp3, Audio48Khz96KBitRateMonoMp3, audio-48khz-96kbitrate-mono-mp3, Raw48Khz16BitMonoPcm, raw-48khz-16bit-mono-pcm, Riff48Khz16BitMonoPcm, riff-48khz-16bit-mono-pcm.
All: Custom voice is also easier to use. Added support for setting custom voice via EndpointId (C++, C#, Java, JavaScript, Objective-C, Python). Before this change, custom voice users needed to set the endpoint URL via the FromEndpoint method. Now customers can use the FromSubscription method just like prebuilt voices, and then provide the deployment ID by setting EndpointId. This simplifies setting up custom voices.
C++/C#/Java/Objective-C/Python: Get more than the top intent fromIntentRecognizer. It now supports configuring the JSON result containing all intents and not only the top scoring intent via LanguageUnderstandingModel FromEndpoint method by using verbose=true uri parameter. This addresses GitHub issue #880. See updated documentation here.
C++/C#/Java: Make your voice assistant or bot stop listening immediately. DialogServiceConnector (C++, C#, Java) now has a StopListeningAsync() method to accompany ListenOnceAsync(). This will immediately stop audio capture and gracefully wait for a result, making it perfect for use with "stop now" button-press scenarios.
C++/C#/Java/JavaScript: Make your voice assistant or bot react better to underlying system errors. DialogServiceConnector (C++, C#, Java, JavaScript) now has a new TurnStatusReceived event handler. These optional events correspond to every ITurnContext resolution on the Bot and will report turn execution failures when they happen, for example, as a result of an unhandled exception, timeout, or network drop between Direct Line Speech and the bot. TurnStatusReceived makes it easier to respond to failure conditions. For example, if a bot takes too long on a backend database query (for example, looking up a product), TurnStatusReceived allows the client to know to reprompt with "sorry, I didn't quite get that, could you please try again" or something similar.
C++/C#: Use the Speech SDK on more platforms. The Speech SDK NuGet package now supports Windows ARM/Arm64 desktop native binaries (UWP was already supported) to make the Speech SDK more useful on more machine types.
Java: DialogServiceConnector now has a setSpeechActivityTemplate() method that was unintentionally excluded from the language previously. This is equivalent to setting the Conversation_Speech_Activity_Template property and will request that all future Bot Framework activities originated by the Direct Line Speech service merge the provided content into their JSON payloads.
Java: Improved low-level debugging. The Connection class now has a MessageReceived event, similar to other programming languages (C++, C#). This event provides low-level access to incoming data from the service and can be useful for diagnostics and debugging.
JavaScript: Easier setup for Voice Assistants and bots through BotFrameworkConfig, which now has fromHost() and fromEndpoint() factory methods that simplify the use of custom service locations versus manually setting properties. We also standardized optional specification of botId to use a non-default bot across the configuration factories.
JavaScript: Improved on device performance through added string control property for websocket compression. For performance reasons, we disabled websocket compression by default. This can be reenabled for low-bandwidth scenarios. More details here. This addresses GitHub issue #242.
JavaScript: Added support for lPronunciation Assessment to enable evaluation of speech pronunciation. See the quickstart here.

Bug fixes

All (except JavaScript): Fixed a regression in version 1.14, in which too much memory was allocated by the recognizer.
C++: Fixed a garbage collection issue with DialogServiceConnector, addressing GitHub issue #794.
C#: Fixed an issue with thread shutdown that caused objects to block for about a second when disposed.
C++/C#/Java: Fixed an exception preventing an application from setting speech authorization token or activity template more than once on a DialogServiceConnector.
C++/C#/Java: Fixed a recognizer crash due to a race condition in teardown.
JavaScript: DialogServiceConnector didn't previously honor the optional botId parameter specified in BotFrameworkConfig's factories. This made it necessary to set the botId query string parameter manually to use a non-default bot. The bug has been corrected and botId values provided to BotFrameworkConfig's factories will be honored and used, including the new fromHost() and fromEndpoint() additions. This also applies to the applicationId parameter for CustomCommandsConfig.
JavaScript: Fixed GitHub issue #881, allowing recognizer object reusage.
JavaScript: Fixed an issue where the SKD was sending speech.config multiple times in one TTS session, wasting bandwidth.
JavaScript: Simplified error handling on microphone authorization, allowing more descriptive message to bubble up when user hasn't allowed microphone input on their browser.
JavaScript: Fixed GitHub issue #249 where type errors in ConversationTranslator and ConversationTranscriber caused a compilation error for TypeScript users.
Objective-C: Fixed an issue where GStreamer build failed for iOS on Xcode 11.4, addressing GitHub issue #911.
Python: Fixed GitHub issue #870, removing "DeprecationWarning: the imp module is deprecated in favor of importlib".

Samples

From-file sample for JavaScript browser now uses files for speech recognition. This addresses GitHub issue #884.

Speech SDK 1.14.0: 2020-October release

Note

The Speech SDK on Windows depends on the shared Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019. Download it here.

New features

Linux: Added support for Debian 10 and Ubuntu 20.04 LTS.
Python/Objective-C: Added support for the KeywordRecognizer API. Documentation will be here.
C++/Java/C#: Added support to set any HttpHeader key/value via ServicePropertyChannel::HttpHeader.
JavaScript: Added support for the ConversationTranscriber API. Read documentation here.
C++/C#: Added new AudioDataStream FromWavFileInput method (to read .WAV files) here (C++) and here (C#).
C++/C#/Java/Python/Objective-C/Swift: Added a stopSpeakingAsync() method to stop text to speech synthesis. Read the Reference documentation here (C++), here (C#), here (Java), here (Python), and here (Objective-C/Swift).
C#, C++, Java: Added a FromDialogServiceConnector() function to the Connection class that can be used to monitor connection and disconnection events for DialogServiceConnector. Read the Reference documentation here (C#), here (C++), and here (Java).
C++/C#/Java/Python/Objective-C/Swift: Added support for Pronunciation Assessment, which evaluates speech pronunciation and gives speakers feedback on the accuracy and fluency of spoken audio. Read the documentation here.

Breaking change

JavaScript: PullAudioOutputStream.read() has a return type change from an internal Promise to a Native JavaScript Promise.

Bug fixes

All: Fixed 1.13 regression in SetServiceProperty where values with certain special characters were ignored.
C#: Fixed Windows console samples on Visual Studio 2019 failing to find native DLLs.
C#: Fixed crash with memory management if stream is used as KeywordRecognizer input.
ObjectiveC/Swift: Fixed crash with memory management if stream is used as recognizer input.
Windows: Fixed coexistence issue with BT HFP/A2DP on UWP.
JavaScript: Fixed mapping of session IDs to improve logging and aid in internal debug/service correlations.
JavaScript: Added fix for DialogServiceConnector disabling ListenOnce calls after the first call is made.
JavaScript: Fixed issue where result output would only ever be "simple".
JavaScript: Fixed continuous recognition issue in Safari on macOS.
JavaScript: CPU load mitigation for high request throughput scenario.
JavaScript: Allow access to details of Voice Profile Enrollment result.
JavaScript: Added fix for continuous recognition in IntentRecognizer.
C++/C#/Java/Python/Swift/ObjectiveC: Fixed incorrect url for australiaeast and brazilsouth in IntentRecognizer.
C++/C#: Added VoiceProfileType as an argument when creating a VoiceProfile object.
C++/C#/Java/Python/Swift/ObjectiveC: Fixed potential SPX_INVALID_ARG when trying to read AudioDataStream from a given position.
IOS: Fixed crash with speech recognition on Unity

Samples

ObjectiveC: Added sample for keyword recognition here.
C#/JavaScript: Added quickstart for conversation transcription here (C#) and here (JavaScript).
C++/C#/Java/Python/Swift/ObjectiveC: Added sample for Pronunciation Assessment here

Known Issue

DigiCert Global Root G2 certificate isn't supported by default in HoloLens 2 and Android 4.4 (KitKat) and needs to be added to the system to make the Speech SDK functional. The certificate will be added to HoloLens 2 OS images in the near future. Android 4.4 customers need to add the updated the certificate to the system.

COVID-19 abridged testing

Due to working remotely over the last few weeks, we couldn't do as much manual verification testing as we normally do. We haven't made any changes we think could have broken anything, and our automated tests all passed. In the unlikely event that we missed something, please let us know on GitHub.
Stay healthy!

Speech SDK 1.13.0: 2020-July release

Note

The Speech SDK on Windows depends on the shared Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019. Download and install it from here.

New features

C#: Added support for asynchronous conversation transcription. See documentation here.
JavaScript: Added Speaker Recognition support for both browser and Node.js.
JavaScript: Added support for Language Identification/language ID. See documentation here.
Objective-C: Added support for multi-device conversation and conversation transcription.
Python: Added compressed audio support for Python on Windows and Linux. See documentation here.

Bug fixes

All: Fixed an issue that caused the KeywordRecognizer to not move forward the streams after a recognition.
All: Fixed an issue that caused the stream obtained from a KeywordRecognitionResult to not contain the keyword.
All: Fixed an issue that the SendMessageAsync doesn't really send the message over the wire after the users finish waiting for it.
All: Fixed a crash in Speaker Recognition APIs when users call VoiceProfileClient::SpeakerRecEnrollProfileAsync method multiple times and didn't wait for the calls to finish.
All: Fixed enable file logging in VoiceProfileClient and SpeakerRecognizer classes.
JavaScript: Fixed an issue with throttling when browser is minimized.
JavaScript: Fixed an issue with a memory leak on streams.
JavaScript: Added caching for OCSP responses from NodeJS.
Java: Fixed an issue that was causing BigInteger fields to always return 0.
iOS: Fixed an issue with publishing Speech SDK-based apps in the iOS App Store.

Samples

C++: Added sample code for Speaker Recognition here.

COVID-19 abridged testing

Speech SDK 1.12.1: 2020-June release

New features

C#, C++: Speaker Recognition Preview: This feature enables speaker identification (who is speaking?) and speaker verification (is the speaker who they claim to be?). See the overview documentation.

Bug fixes

C#, C++: Fixed microphone recording wasn't working in 1.12 in Speaker Recognition.
JavaScript: Fixes for Text to speech in Firefox, and Safari on macOS and iOS.
Fix for Windows application verifier access violation crash on conversation transcription when using eight-channel stream.
Fix for Windows application verifier access violation crash on multi-device conversation translation.

Samples

C#: Code sample for Speaker Recognition.
C++: Code sample for Speaker Recognition.
Java: Code sample for intent recognition on Android.

COVID-19 abridged testing

Speech SDK 1.12.0: 2020-May release

New features

Go: New Go language support for Speech Recognition and custom voice assistant. Set up your dev environment here. For sample code, see the Samples section below.
JavaScript: Added Browser support for text to speech. See documentation here.
C++, C#, Java: New KeywordRecognizer object and APIs supported on Windows, Android, Linux & iOS platforms. Read the documentation here. For sample code, see the Samples section below.
Java: Added multi-device conversation with translation support. See the reference doc here.

Improvements & Optimizations

JavaScript: Optimized browser microphone implementation improving speech recognition accuracy.
Java: Refactored bindings using direct JNI implementation without SWIG. This change reduces by 10x the bindings size for all Java packages used for Windows, Android, Linux, and Mac and eases further development of the Speech SDK Java implementation.
Linux: Updated support documentation with the latest RHEL 7 specific notes.
Improved connection logic to attempt connecting multiple times when service and network errors occur.
Updated the portal.azure.com Speech Quickstart page to help developers take the next step in the Azure AI Speech journey.

Bug fixes

C#, Java: Fixed an issue with loading SDK libraries on Linux ARM (both 32 bit and 64 bit).
C#: Fixed explicit disposal of native handles for TranslationRecognizer, IntentRecognizer, and Connection objects.
C#: Fixed audio input lifetime management for ConversationTranscriber object.
Fixed an issue where IntentRecognizer result reason wasn't set properly when recognizing intents from simple phrases.
Fixed an issue where SpeechRecognitionEventArgs result offset wasn't set correctly.
Fixed a race condition where SDK was trying to send a network message before opening the websocket connection. Was reproducible for TranslationRecognizer while adding participants.
Fixed memory leaks in the keyword recognizer engine.

Samples

Go: Added quickstarts for speech recognition and custom voice assistant. Find sample code here.
JavaScript: Added quickstarts for Text to speech, Translation, and Intent Recognition.
Keyword recognition samples for C# and Java (Android).

COVID-19 abridged testing

Due to working remotely over the last few weeks, we couldn't do as much manual verification testing as we normally do. We haven't made any changes we think could have broken anything, and our automated tests all passed. If we missed something, please let us know on GitHub.
Stay healthy!

Speech SDK 1.11.0: 2020-March release

New features

Linux: Added support for Red Hat Enterprise Linux (RHEL)/CentOS 7 x64.
Linux: Added support for .NET Core C# on Linux ARM32 and Arm64. Read more here.
C#, C++: Added UtteranceId in ConversationTranscriptionResult, a consistent ID across all the intermediates and final speech recognition result. Details for C#, C++.
Python: Added support for Language ID. See speech_sample.py in GitHub repo.
Windows: Added compressed audio input format support on Windows platform for all the win32 console applications. Details here.
JavaScript: Support speech synthesis (text to speech) in NodeJS. Learn more here.
JavaScript: Add new APIs to enable inspection of all send and received messages. Learn more here.

Bug fixes

C#, C++: Fixed an issue so SendMessageAsync now sends binary message as binary type. Details for C#, C++.
C#, C++: Fixed an issue where using Connection MessageReceived event may cause crash if Recognizer is disposed before Connection object. Details for C#, C++.
Android: Audio buffer size from microphone decreased from 800 ms to 100 ms to improve latency.
Android: Fixed an issue with x86 Android emulator in Android Studio.
JavaScript: Added support for Regions in China with the fromSubscription API. Details here.
JavaScript: Add more error information for connection failures from NodeJS.

Samples

Unity: Intent recognition public sample is fixed, where LUIS json import was failing. Details here.
Python: Sample added for Language ID. Details here.

Covid19 abridged testing: Due to working remotely over the last few weeks, we couldn't do as much manual device verification testing as we normally do. For example, we couldn't test microphone input and speaker output on Linux, iOS, and macOS. We haven't made any changes we think could have broken anything on these platforms, and our automated tests all passed. In the unlikely event that we missed something, let us know on GitHub.
Thank you for your continued support. As always, please post questions or feedback on GitHub or Stack Overflow.
Stay healthy!

Speech SDK 1.10.0: 2020-February release

New features

Added Python packages to support the new 3.8 release of Python.
Red Hat Enterprise Linux (RHEL)/CentOS 8 x64 support (C++, C#, Java, Python).

Note

Customers must configure OpenSSL according to these instructions.
Linux ARM32 support for Debian and Ubuntu.
DialogServiceConnector now supports an optional "bot ID" parameter on BotFrameworkConfig. This parameter allows the use of multiple Direct Line Speech bots with a single Speech resource. Without the parameter specified, the default bot (as determined by the Direct Line Speech channel configuration page) will be used.
DialogServiceConnector now has a SpeechActivityTemplate property. The contents of this JSON string will be used by Direct Line Speech to prepopulate a wide variety of supported fields in all activities that reach a Direct Line Speech bot, including activities automatically generated in response to events like speech recognition.
TTS now uses subscription key for authentication, reducing the first byte latency of the first synthesis result after creating a synthesizer.
Updated speech recognition models for 19 locales for an average word error rate reduction of 18.6% (es-ES, es-MX, fr-CA, fr-FR, it-IT, ja-JP, ko-KR, pt-BR, zh-CN, zh-HK, nb-NO, fi-FL, ru-RU, pl-PL, ca-ES, zh-TW, th-TH, pt-PT, tr-TR). The new models bring significant improvements across multiple domains including Dictation, Call-Center Transcription, and Video Indexing scenarios.

Bug fixes

Fixed bug where Conversation Transcriber didn't await properly in JAVA APIs
Android x86 emulator fix for Xamarin GitHub issue
Add missing (Get|Set)Property methods to AudioConfig
Fix a TTS bug where the audioDataStream couldn't be stopped when connection fails
Using an endpoint without a region would cause USP failures for conversation translator
ID generation in Universal Windows Applications now uses an appropriately unique GUID algorithm; it previously and unintentionally defaulted to a stubbed implementation that often produced collisions over large sets of interactions.

Samples

Unity sample for using Speech SDK with Unity microphone and push mode streaming

Other changes

OpenSSL configuration documentation updated for Linux

Speech SDK 1.9.0: 2020-January release

New features

Multi-device conversation: connect multiple devices to the same speech or text-based conversation, and optionally translate messages sent between them. Learn more in this article.
Keyword recognition support added for Android .aar package and added support for x86 and x64 flavors.
Objective-C: SendMessage and SetMessageProperty methods added to Connection object. See documentation here.
TTS C++ api now supports std::wstring as synthesis text input, removing the need to convert a wstring to string before passing it to the SDK. See details here.
C#: Language ID and source language config are now available.
JavaScript: Added a feature to Connection object to pass through custom messages from the Speech service as callback receivedServiceMessage.
JavaScript: Added support for FromHost API to ease use with on-premises containers and sovereign clouds. See documentation here.
JavaScript: We now honor NODE_TLS_REJECT_UNAUTHORIZED thanks to a contribution from orgads. See details here.

Breaking changes

OpenSSL has been updated to version 1.1.1b and is statically linked to the Speech SDK core library for Linux. This may cause a break if your inbox OpenSSL hasn't been installed to the /usr/lib/ssl directory in the system. Check our documentation under Speech SDK docs to work around the issue.
We've changed the data type returned for C# WordLevelTimingResult.Offset from int to long to allow for access to WordLevelTimingResults when speech data is longer than 2 minutes.
PushAudioInputStream and PullAudioInputStream now send wav header information to the Speech service based on AudioStreamFormat, optionally specified when they were created. Customers must now use the supported audio input format. Any other formats will get suboptimal recognition results or may cause other issues.

Bug fixes

See the OpenSSL update under Breaking changes above. We fixed both an intermittent crash and a performance issue (lock contention under high load) in Linux and Java.
Java: Made improvements to object closure in high concurrency scenarios.
Restructured our NuGet package. We removed the three copies of Microsoft.CognitiveServices.Speech.core.dll and Microsoft.CognitiveServices.Speech.extension.kws.dll under lib folders, making the NuGet package smaller and faster to download, and we added headers needed to compile some C++ native apps.
Fixed quickstart samples here. These were exiting without displaying "microphone not found" exception on Linux, macOS, Windows.
Fixed SDK crash with long speech recognition results on certain code paths like this sample.
Fixed SDK deployment error in Azure Web App environment to address this customer issue.
Fixed a TTS error while using multi <voice> tag or <audio> tag to address this customer issue.
Fixed a TTS 401 error when the SDK is recovered from suspended.
JavaScript: Fixed a circular import of audio data thanks to a contribution from euirim.
JavaScript: added support for setting service properties, as added in 1.7.
JavaScript: fixed an issue where a connection error could result in continuous, unsuccessful websocket reconnect attempts.

Samples

Added keyword recognition sample for Android here.
Added TTS sample for the server scenario here.
Added Multi-device conversation quickstarts for C# and C++ here.

Other changes

Optimized SDK core library size on Android.
SDK in 1.9.0 and onwards supports both int and string types in the voice signature version field for Conversation Transcriber.

Speech SDK 1.8.0: 2019-November release

New features

Added a FromHost() API, to ease use with on-premises containers and sovereign clouds.
Added Source Language Identification for Speech Recognition (in Java and C++)
Added SourceLanguageConfig object for Speech Recognition, used to specify expected source languages (in Java and C++)
Added KeywordRecognizer support on Windows (UWP), Android and iOS through the NuGet and Unity packages
Added Remote Conversation Java API to do Conversation Transcription in asynchronous batches.

Breaking changes

Conversation Transcriber functionalities moved under namespace Microsoft.CognitiveServices.Speech.Transcription.
Parts of the Conversation Transcriber methods are moved to new Conversation class.
Dropped support for 32-bit (ARMv7 and x86) iOS

Bug fixes

Fix for crash if local KeywordRecognizer is used without a valid Speech service subscription key

Samples

Xamarin sample for KeywordRecognizer
Unity sample for KeywordRecognizer
C++ and Java samples for Automatic Source Language Identification.

Speech SDK 1.7.0: 2019-September release

New features

Added beta support for Xamarin on Universal Windows Platform (UWP), Android, and iOS
Added iOS support for Unity
Added Compressed input support for ALaw, Mulaw, FLAC, on Android, iOS, and Linux
Added SendMessageAsync in Connection class for sending a message to service
Added SetMessageProperty in Connection class for setting property of a message
TTS added bindings for Java (JRE and Android), Python, Swift, and Objective-C
TTS added playback support for macOS, iOS, and Android.
Added "word boundary" information for TTS.

Bug fixes

Fixed IL2CPP build issue on Unity 2019 for Android
Fixed issue with malformed headers in wav file input being processed incorrectly
Fixed issue with UUIDs not being unique in some connection properties
Fixed a few warnings about nullability specifiers in the Swift bindings (might require small code changes)
Fixed a bug that caused websocket connections to be closed ungracefully under network load
Fixed an issue on Android that sometimes results in duplicate impression IDs used by DialogServiceConnector
Improvements to the stability of connections across multi-turn interactions and the reporting of failures (via Canceled events) when they occur with DialogServiceConnector
DialogServiceConnector session starts will now properly provide events, including when calling ListenOnceAsync() during an active StartKeywordRecognitionAsync()
Addressed a crash associated with DialogServiceConnector activities being received

Samples

Quickstart for Xamarin
Updated CPP Quickstart with Linux Arm64 information
Updated Unity quickstart with iOS information

Speech SDK 1.6.0: 2019-June release

Samples

Quickstart samples for Text To Speech on UWP and Unity
Quickstart sample for Swift on iOS
Unity samples for Speech & Intent Recognition and Translation
Updated quickstart samples for DialogServiceConnector

Improvements / Changes

Dialog namespace:
- SpeechBotConnector has been renamed to DialogServiceConnector
- BotConfig has been renamed to DialogServiceConfig
- BotConfig::FromChannelSecret() has been remapped to DialogServiceConfig::FromBotSecret()
- All existing Direct Line Speech clients continue to be supported after the rename
Update TTS REST adapter to support proxy, persistent connection
Improve error message when an invalid region is passed
Swift/Objective-C:
- Improved error reporting: Methods that can result in an error are now present in two versions: One that exposes an NSError object for error handling, and one that raises an exception. The former are exposed to Swift. This change requires adaptations to existing Swift code.
- Improved event handling

Bug fixes

Fix for TTS: where SpeakTextAsync future returned without waiting until audio has completed rendering
Fix for marshaling strings in C# to enable full language support
Fix for .NET core app problem to load core library with net461 target framework in samples
Fix for occasional issues to deploy native libraries to the output folder in samples
Fix for web socket closing reliably
Fix for possible crash while opening a connection under heavy load on Linux
Fix for missing metadata in the framework bundle for macOS
Fix for problems with pip install --user on Windows

Speech SDK 1.5.1

This is a bug fix release and only affecting the native/managed SDK. It isn't affecting the JavaScript version of the SDK.

Bug fixes

Fix FromSubscription when used with Conversation Transcription.
Fix bug in keyword spotting for Voice Assistants.

Speech SDK 1.5.0: 2019-May release

New features

Keyword spotting (KWS) is now available for Windows and Linux. KWS functionality might work with any microphone type, official KWS support, however, is currently limited to the microphone arrays found in the Azure Kinect DK hardware or the Speech Devices SDK.
Phrase hint functionality is available through the SDK. For more information, see here.
Conversation transcription functionality is available through the SDK.
Add support for Voice Assistants using the Direct Line Speech channel.

Samples

Added samples for new features or new services supported by the SDK.

Improvements / Changes

Added various recognizer properties to adjust service behavior or service results (like masking profanity and others).
You can now configure the recognizer through the standard configuration properties, even if you created the recognizer FromEndpoint.
Objective-C: OutputFormat property was added to SPXSpeechConfiguration.
The SDK now supports Debian 9 as a Linux distribution.

Bug fixes

Fixed a problem where the speaker resource was destructed too early in text to speech.

Speech SDK 1.4.2

This is a bug fix release and only affecting the native/managed SDK. It isn't affecting the JavaScript version of the SDK.

Speech SDK 1.4.1

This is a JavaScript-only release. No features have been added. The following fixes were made:

Prevent web pack from loading https-proxy-agent.

Speech SDK 1.4.0: 2019-April release

New features

The SDK now supports the Text to speech service as a beta version. It's supported on Windows and Linux Desktop from C++ and C#. For more information, check the Text to speech overview.
The SDK now supports MP3 and Opus/OGG audio files as stream input files. This feature is available only on Linux from C++ and C# and is currently in beta (more details here).
The Speech SDK for Java, .NET core, C++ and Objective-C have gained macOS support. The Objective-C support for macOS is currently in beta.
iOS: The Speech SDK for iOS (Objective-C) is now also published as a CocoaPod.
JavaScript: Support for non-default microphone as an input device.
JavaScript: Proxy support for Node.js.

Samples

Samples for using the Speech SDK with C++ and with Objective-C on macOS have been added.
Samples demonstrating the usage of the Text to speech service have been added.

Improvements / Changes

Python: Additional properties of recognition results are now exposed via the properties property.
For additional development and debug support, you can redirect SDK logging and diagnostics information into a log file (more details here).
JavaScript: Improve audio processing performance.

Bug fixes

Mac/iOS: A bug that led to a long wait when a connection to the Speech service couldn't be established was fixed.
Python: improve error handling for arguments in Python callbacks.
JavaScript: Fixed wrong state reporting for speech ended on RequestSession.

Speech SDK 1.3.1: 2019-February refresh

This is a bug fix release and only affecting the native/managed SDK. It isn't affecting the JavaScript version of the SDK.

Bug fix

Fixed a memory leak when using microphone input. Stream based or file input isn't affected.

Speech SDK 1.3.0: 2019-February release

New features

The Speech SDK supports selection of the input microphone through the AudioConfig class. This allows you to stream audio data to the Speech service from a non-default microphone. For more information, see the documentation describing audio input device selection. This feature isn't yet available from JavaScript.
The Speech SDK now supports Unity in a beta version. Provide feedback through the issue section in the GitHub sample repository. This release supports Unity on Windows x86 and x64 (desktop or Universal Windows Platform applications), and Android (ARM32/64, x86). More information is available in our Unity quickstart.
The file Microsoft.CognitiveServices.Speech.csharp.bindings.dll (shipped in previous releases) isn't needed anymore. The functionality is now integrated into the core SDK.

Samples

The following new content is available in our sample repository:

Additional samples for AudioConfig.FromMicrophoneInput.
Additional Python samples for intent recognition and translation.
Additional samples for using the Connection object in iOS.
Additional Java samples for translation with audio output.
New sample for use of the Batch Transcription REST API.

Improvements / Changes

Python
- Improved parameter verification and error messages in SpeechConfig.
- Add support for the Connection object.
- Support for 32-bit Python (x86) on Windows.
- The Speech SDK for Python is out of beta.
iOS
- The SDK is now built against the iOS SDK version 12.1.
- The SDK now supports iOS versions 9.2 and later.
- Improve reference documentation and fix several property names.
JavaScript
- Add support for the Connection object.
- Add type definition files for bundled JavaScript
- Initial support and implementation for phrase hints.
- Return properties collection with service JSON for recognition
Windows DLLs do now contain a version resource.
If you create a recognizer FromEndpoint, you can add parameters directly to the endpoint URL. Using FromEndpoint you can't configure the recognizer through the standard configuration properties.

Bug fixes

Empty proxy username and proxy password weren't handled correctly. With this release, if you set proxy username and proxy password to an empty string, they won't be submitted when connecting to the proxy.
SessionId's created by the SDK weren't always truly random for some languages / environments. Added random generator initialization to fix this issue.
Improve handling of authorization token. If you want to use an authorization token, specify in the SpeechConfig and leave the subscription key empty. Then create the recognizer as usual.
In some cases, the Connection object wasn't released correctly. This issue has been fixed.
The JavaScript sample was fixed to support audio output for translation synthesis also on Safari.

Speech SDK 1.2.1

This is a JavaScript-only release. No features have been added. The following fixes were made:

Fire end of stream at turn.end, not at speech.end.
Fix bug in audio pump that didn't schedule next send if the current send failed.
Fix continuous recognition with auth token.
Bug fix for different recognizer / endpoints.
Documentation improvements.

Speech SDK 1.2.0: 2018-December release

New features

Python
- The Beta version of Python support (3.5 and above) is available with this release. For more information, see here](../../quickstart-python.md).
JavaScript
- The Speech SDK for JavaScript has been open-sourced. The source code is available on GitHub.
- We now support Node.js, more info can be found here.
- The length restriction for audio sessions has been removed, reconnection will happen automatically under the cover.
Connection object
- From the Recognizer, you can access a Connection object. This object allows you to explicitly initiate the service connection and subscribe to connect and disconnect events. (This feature isn't yet available from JavaScript and Python.)
Support for Ubuntu 18.04.
Android
- Enabled ProGuard support during APK generation.

Improvements

Improvements in the internal thread usage, reducing the number of threads, locks, mutexes.
Improved error reporting / information. In several cases, error messages haven't been propagated out all the way out.
Updated development dependencies in JavaScript to use up-to-date modules.

Bug fixes

Fixed memory leaks due to a type mismatch in RecognizeAsync.
In some cases exceptions were being leaked.
Fixing memory leak in translation event arguments.
Fixed a locking issue on reconnect in long running sessions.
Fixed an issue that could lead to missing final result for failed translations.
C#: If an async operation wasn't awaited in the main thread, it was possible the recognizer could be disposed before the async task was completed.
Java: Fixed a problem resulting in a crash of the Java VM.
Objective-C: Fixed enum mapping; RecognizedIntent was returned instead of RecognizingIntent.
JavaScript: Set default output format to 'simple' in SpeechConfig.
JavaScript: Removing inconsistency between properties on the config object in JavaScript and other languages.

Samples

Updated and fixed several samples (for example output voices for translation, etc.).
Added Node.js samples in the sample repository.

Speech SDK 1.1.0

New features

Support for Android x86/x64.
Proxy Support: In the SpeechConfig object, you can now call a function to set the proxy information (hostname, port, username, and password). This feature isn't yet available on iOS.
Improved error code and messages. If a recognition returned an error, this did already set Reason (in canceled event) or CancellationDetails (in recognition result) to Error. The canceled event now contains two additional members, ErrorCode and ErrorDetails. If the server returned additional error information with the reported error, it will now be available in the new members.

Improvements

Added additional verification in the recognizer configuration, and added additional error message.
Improved handling of long-time silence in middle of an audio file.
NuGet package: for .NET Framework projects, it prevents building with AnyCPU configuration.

Bug fixes

Fixed several exceptions found in recognizers. In addition, exceptions are caught and converted into Canceled event.
Fix a memory leak in property management.
Fixed bug in which an audio input file could crash the recognizer.
Fixed a bug where events could be received after a session stop event.
Fixed some race conditions in threading.
Fixed an iOS compatibility issue that could result in a crash.
Stability improvements for Android microphone support.
Fixed a bug where a recognizer in JavaScript would ignore the recognition language.
Fixed a bug preventing setting the EndpointId (in some cases) in JavaScript.
Changed parameter order in AddIntent in JavaScript, and added missing AddIntent JavaScript signature.

Samples

Added C++ and C# samples for pull and push stream usage in the sample repository.

Speech SDK 1.0.1

Reliability improvements and bug fixes:

Fixed potential fatal error due to race condition in disposing recognizer
Fixed potential fatal error when unset properties occur.
Added additional error and parameter checking.
Objective-C: Fixed possible fatal error caused by name overriding in NSString.
Objective-C: Adjusted visibility of API
JavaScript: Fixed regarding events and their payloads.
Documentation improvements.

In our sample repository, a new sample for JavaScript was added.

Azure AI Speech SDK 1.0.0: 2018-September release

New features

Support for Objective-C on iOS. Check out our Objective-C quickstart for iOS.
Support for JavaScript in browser. Check out our JavaScript quickstart.

Breaking changes

With this release, a number of breaking changes are introduced. Check this page for details.

Azure AI Speech SDK 0.6.0: 2018-August release

New features

UWP apps built with the Speech SDK now can pass the Windows App Certification Kit (WACK). Check out the UWP quickstart.
Support for .NET Standard 2.0 on Linux (Ubuntu 16.04 x64).
Experimental: Support Java 8 on Windows (64-bit) and Linux (Ubuntu 16.04 x64). Check out the Java Runtime Environment quickstart.

Functional change

Expose additional error detail information on connection errors.

Breaking changes

On Java (Android), the SpeechFactory.configureNativePlatformBindingWithDefaultCertificate function no longer requires a path parameter. Now the path is automatically detected on all supported platforms.
The get-accessor of the property EndpointUrl in Java and C# was removed.

Bug fixes

In Java, the audio synthesis result on the translation recognizer is implemented now.
Fixed a bug that could cause inactive threads and an increased number of open and unused sockets.
Fixed a problem, where a long-running recognition could terminate in the middle of the transmission.
Fixed a race condition in recognizer shutdown.

Azure AI Speech SDK 0.5.0: 2018-July release

New features

Support Android platform (API 23: Android 6.0 Marshmallow or higher). Check out the Android quickstart.
Support .NET Standard 2.0 on Windows. Check out the .NET Core quickstart.
Experimental: Support UWP on Windows (version 1709 or later).
- Check out the UWP quickstart.
- Note that UWP apps built with the Speech SDK don't yet pass the Windows App Certification Kit (WACK).
Support long-running recognition with automatic reconnection.

Functional changes

StartContinuousRecognitionAsync() supports long-running recognition.
The recognition result contains more fields. They're offset from the audio beginning and duration (both in ticks) of the recognized text and additional values that represent recognition status, for example, InitialSilenceTimeout and InitialBabbleTimeout.
Support AuthorizationToken for creating factory instances.

Breaking changes

Recognition events: NoMatch event type was merged into the Error event.
SpeechOutputFormat in C# was renamed to OutputFormat to stay aligned with C++.
The return type of some methods of the AudioInputStream interface changed slightly:
- In Java, the read method now returns long instead of int.
- In C#, the Read method now returns uint instead of int.
- In C++, the Read and GetFormat methods now return size_t instead of int.
C++: Instances of audio input streams now can be passed only as a shared_ptr.

Bug fixes

Fixed incorrect return values in the result when RecognizeAsync() times out.
The dependency on media foundation libraries on Windows was removed. The SDK now uses Core Audio APIs.
Documentation fix: Added a regions page to describe the supported regions.

Known Issue

The Speech SDK for Android doesn't report speech synthesis results for translation. This issue will be fixed in the next release.

Azure AI Speech SDK 0.4.0: 2018-June release

Functional changes

AudioInputStream

A recognizer now can consume a stream as the audio source. For more information, see the related how-to guide.
Detailed output format

When you create a SpeechRecognizer, you can request Detailed or Simple output format. The DetailedSpeechRecognitionResult contains a confidence score, recognized text, raw lexical form, normalized form, and normalized form with masked profanity.

Breaking change

Changed to SpeechRecognitionResult.Text from SpeechRecognitionResult.RecognizedText in C#.

Bug fixes

Fixed a possible callback issue in the USP layer during shutdown.
If a recognizer consumed an audio input file, it was holding on to the file handle longer than necessary.
Removed several deadlocks between the message pump and the recognizer.
Fire a NoMatch result when the response from service is timed out.
The media foundation libraries on Windows are delay loaded. This library is required for microphone input only.
The upload speed for audio data is limited to about twice the original audio speed.
On Windows, C# .NET assemblies now are strong named.
Documentation fix: Region is required information to create a recognizer.

More samples have been added and are constantly being updated. For the latest set of samples, see the Speech SDK samples GitHub repository.

Azure AI Speech SDK 0.2.12733: 2018-May release

This release is the first public preview release of the Azure AI Speech SDK.

Speech CLI 1.40.0: August 2024 release

Updated to use Speech SDK 1.40.0

New features

none

Bug fixes

none

Speech CLI 1.38.0: June 2024 release

Updated to use Speech SDK 1.38.0

New features

none

Bug fixes

none

Speech CLI 1.37.0: April 2024 release

Updated to use Speech SDK 1.37.0

New features

none

Bug fixes

none

Speech CLI 1.36.0: March 2024 release

Updated to use Speech SDK 1.36.0

New features

none

Bug fixes

none

Speech CLI 1.35.0: February 2024 release

Updated to use Speech SDK 1.35.0

New features

none

Bug fixes

Update JMESPath dependency to latest

Speech CLI 1.34.0: November 2023 release

Updated to use Speech SDK 1.34.0

Speech CLI 1.33.0: October 2023 release

Updated to use Speech SDK 1.33.0

Speech CLI 1.31.0: August 2023 release

Updated to use Speech SDK 1.31.0

Speech CLI 1.30.0: July 2023 release

Updated to use Speech SDK 1.30.0

Speech CLI 1.29.0: June 2023 release

Updated to use Speech SDK 1.29.0

Speech CLI 1.28.0: May 2023 release

Updated to use Speech SDK 1.28.0

Speech CLI 1.27.0: April 2023 release

Updates

Updated to use Speech SDK 1.27.0
Update default endpoint to use v3.1 REST APIs for custom speech Recognition and Batch Speech Recognition.

Bug fixes

Fixes related to how query parameters are parsed/configured.

Speech CLI 1.26.0: March 2023 release

Updated to use Speech SDK 1.26.0.

Speech CLI 1.25.0: January 2023 release

Updated to use Speech SDK 1.25.0.

Speech CLI 1.24.0: October 2022 release

Uses Speech SDK 1.24.0.

New features

Expanded "spx check" to support JMESPath queries against all spx events

Bug fixes

Various improvements to robustness against JMESPath query evaluations
Fix for truncations to file writes that may occur on resource-constrained machines

Speech CLI 1.23.0: July 2022 release

Uses Speech SDK 1.23.0.

New features

Better caption (--output vtt and --output srt) large result splitting (37 char max, 3 lines)
Documented spx synthesize --format options (see spx help synthesize format)
Documented most of spx csr commands/options (see spx help csr)
Added spx csr model copy command (see spx help csr model copy)
Added --check result option using JMES queries (see spx help check result)
Improved error messages when specifying invalid command options
Moved from .NET Core 3.1 to .NET 6.0. In order to run Speech CLI, you'll need to install the .NET 6.0 Runtime (or above).

Bug fixes

Updated all URLs to remove language (for example, "en-US")
Fixed version info to report properly in all cases (previously it sometimes showed a blank)

Speech CLI 1.22.0: June 2022 release

Uses Speech SDK 1.22.0.

New features

Added spx init command to guide users through the Speech resource key creation without going to Azure Web Portal.
Speech docker containers now have Azure CLI included, so the spx init command works out of the box.
Added timestamp as an event output option, to make SPX more useful when calculating latencies.

Speech CLI 1.21.0: April 2022 release

Uses Speech SDK 1.21.0.

New features

WEBVTT Caption generation
- Added --output vtt support to spx translate
- Supports --output vtt file FILENAME to override default VTT FILENAME
- Supports --output vtt file - to write to standard output
- Individual VTT files are created for each target language (for example --target en;de;fr)
SRT Caption generation
- Added --output srt support to spx recognize, spx intent, and spx translate
- Supports --output srt file FILENAME to override default SRT FILENAME
- Supports --output srt file - to write to standard output
- For spx translate, individual SRT files are created for each target language (for example --target en;de;fr)

Bug fixes

Corrected WEBVTT timespan output to properly use hh:mm:ss.fff format

Speech CLI 1.20.0: January 2022 release

New features

Speaker recognition
- spx profile enroll and spx speaker [identify/verify] now support microphone input
Intent recognition (spx intent)
- --keyword FILE.table
- --pattern and --patterns
- --output all/each intentid
- --output all/each entity json
- --output all/each ENTITY entity
- --once, --once+, --continuous (continuous now default)
- --output all/each connection EVENT
- --output all/each connection message (for example, text, path)
CLI console output expectation checking/authoring:
- --expect PATTERN and --not expect PATTERN support on all commands
- --auto expect to assist authoring expected patterns
SDK logging output expectation checking/authoring
- --log expect PATTERN and --not log expect PATTERN support on all commands
- --log auto expect [FILTER] support on all commands
- --log FILE support on spx profile and spx speaker
Audio file input
- --format ANY support on all commands
- --file - support (reading from standard input, enabling pipe scenarios)
Audio file output
- --audio output - Writing to standard output, enabling pipe scenarios
Output files
- --output all/each file - Write to standard output
- --output batch file - Write to standard output
- --output vtt file - Write to standard output
- --output json file - Write to standard output, for spx csr and spx batch commands
Output properties
- --output […] result XXX property (PropertyId or string)
- --output […] connection message received XXX property (PropertyId or string)
- --output […] recognizer XXX property (PropertyId or string)
Azure WebJob integration
- spx webjob now follows sub-command pattern
- Updated WebJob help to reflect the sub-command pattern (see spx help webjob)

Bug fixes

Fixed bug when both --output vtt FILE and --output batch FILE are used at the same time
spx [...] --zip ZIPFILENAME now includes all binaries required for all scenarios (if present)
spx profile and spx speaker commands now return detailed error information on cancellation

2021-May release

New features

Added support for Profile, Speaker ID, and Speaker verification - Try spx profile and spx speaker from the command-line.
We also added Dialog support - Try spx dialog from the command-line.
Improved spx help. Please give us feedback about how this works for you by opening a GitHub issue.
We've decreased the size of the .NET tool install.

COVID-19 abridged testing

As the ongoing pandemic continues to require our engineers to work from home, pre-pandemic manual verification scripts have been significantly reduced. We test on fewer devices with fewer configurations, and the likelihood of environment-specific bugs slipping through may be increased. We still rigorously validate with a large set of automation. In the unlikely event that we missed something, please let us know on GitHub.
Stay healthy!

2021-March release

New features

Added spx intent command for intent recognition, replacing spx recognize intent.
Recognize and intent can now use Azure functions to calculate word error rate using spx recognize --wer url <URL>.
Recognize can now output results as VTT files using spx recognize --output vtt file <FILENAME>.
Sensitive key info now obscured in debug/verbose output.
Added URL checking and error message for content field in batch transcription create.

COVID-19 abridged testing

2021-January release

New features

Speech CLI is now available as a NuGet package and can be installed via .NET CLI as a .NET global tool you can call from the shell/command-line.
The custom speech DevOps Template repo has been updated to use Speech CLI for its custom speech workflows.

COVID-19 abridged testing

2020-October release

SPX is the command-line interface to use the Speech service without writing code. Download the latest version here.

New features

spx csr dataset upload --kind audio|language|acoustic – create datasets from local data, not just from URLs.
spx csr evaluation create|status|list|update|delete – compare new models against baseline truth/other models.
spx * list – supports non-paged experience (doesn't require --top X --skip X).
spx * --http header A=B – support custom headers (added for Office for custom authentication).
spx help – improved text and back-tick text color coded (blue).

2020-June release

Added in-CLI help search features:
- spx help find --text TEXT
- spx help find --topic NAME
Updated to work with newly deployed v3.0 Batch and custom speech APIs:
- spx help batch examples
- spx help csr examples

COVID-19 abridged testing

Speech CLI (Also known as SPX): 2020-May release

SPX is a new command-line tool that allows you to perform recognition, synthesis, translation, batch transcription, and custom speech management from the command-line. Use it to test the Speech service, or to script the Speech service tasks you need to perform. Download the tool and read the documentation here.

October 2024 release

Prebuilt neural voice

Introduce 4 turbo version of Azure OpenAI voices in public preview: en-US-EchoTurboMultilingualNeural, en-US-FableTurboMultilingualNeural, en-US-OnyxTurboMultilingualNeural, and en-US-ShimmerTurboMultilingualNeural. Turbo version of Azure OpenAI voices has the similar voice persona as Azure OpenAI voices but supports extra features. Turbo voices support the full set of SSML elements and more features like word boundary, just like other Azure AI Speech voices. See the full language and voice list for more information.

Prebuilt high definition (HD) neural voice

Azure AI speech high definition (HD) voices are available in public preview. The HD voices can understand the content, automatically detect emotions in the input text, and adjust the speaking tone in real-time to match the sentiment. HD voices maintain a consistent voice persona from their neural (and non HD) counterparts, and deliver even more value through enhanced features. For more information, see What are Azure AI Speech high definition (HD) voices?.

Custom neural voice

Previously, some locales were only supported with V3 for the training recipe. These locales now also support V9, enabling improved training quality and expanded features. For these locales, refer to the following table:

Locale (BCP-47)	Language
`ar-EG`	Arabic (Egypt)
`ar-SA`	Arabic (Saudi Arabia)
`ca-ES`	Catalan
`cs-CZ`	Czech (Czechia)
`da-DK`	Danish (Denmark)
`de-AT`	German (Austria)
`de-CH`	German (Switzerland)
`el-GR`	Greek (Greece)
`en-IN`	English (India)
`fi-FI`	Finnish (Finland)
`fr-CH`	French (Switzerland)
`he-IL`	Hebrew (Israel)
`hi-IN`	Hindi (India)
`hu-HU`	Hungarian (Hungary)
`ms-MY`	Malay (Malaysia)
`nb-NO`	Norwegian Bokmål (Norway)
`nl-NL`	Dutch (Netherlands)
`pl-PL`	Polish (Poland)
`pt-PT`	Portuguese (Portugal)
`ro-RO`	Romanian (Romania)
`ru-RU`	Russian (Russia)
`sk-SK`	Slovak (Slovakia)
`sv-SE`	Swedish (Sweden)
`th-TH`	Thai (Thailand)
`r-TR`	Turkish (Türkiye)
`vi-VN`	Vietnamese (Vietnam)
`zh-HK`	Chinese (Cantonese, Traditional)
`zh-TW`	Chinese (Taiwanese Mandarin, Traditional)

Custom neural voice Pro now supports the following new locales:
- en-NZ: English (New Zealand)
- es-CL: Spanish (Chile)
- es-US: Spanish (United States)
- ta-MY: Tamil (Malaysia)
See the language list for Custom neural voice for the full list of supported locales.

The cross-lingual feature now supports the following new locales as source locales:

Locale (BCP-47)	Language
`da-DK`	Danish (Denmark)
`de-AT`	German (Austria)
`de-CH`	German (Switzerland)
`de-DE`	German (Germany)
`en-CA`	English (Canada)
`fi-FI`	Finnish (Finland)
`fr-CH`	French (Switzerland)
`hu-HU`	Hungarian (Hungary)
`ms-MY`	Malay (Malaysia)
`nb-NO`	Norwegian Bokmål (Norway)
`pt-PT`	Portuguese (Portugal)
`sv-SE`	Swedish (Sweden)
`tr-TR`	Turkish (Türkiye)
`ta-IN`	Tamil (India)
`zh-HK`	Chinese (Cantonese, Traditional)

See the language list for Custom neural voice for the full list of supported locales.

The multi-style voice feature now supports the following new locales:

Locale (BCP-47)	Language
`ar-EG`	Arabic (Egypt)
`ar-SA`	Arabic (Saudi Arabia)
`ca-ES`	Catalan
`cs-CZ`	Czech (Czechia)
`da-DK`	Danish (Denmark)
`de-AT`	German (Austria)
`de-CH`	German (Switzerland)
`de-DE`	German (Germany)
`el-GR`	Greek (Greece)
`en-AU`	English (Australia)
`en-CA`	English (Canada)
`en-GB`	English (United Kingdom)
`en-IN`	English (India)
`es-ES`	Spanish (Spain)
`es-MX`	Spanish (Mexico)
`fi-FI`	Finnish (Finland)
`fr-CA`	French (Canada)
`fr-CH`	French (Switzerland)
`fr-FR`	French (France)
`he-IL`	Hebrew (Israel)
`hi-IN`	Hindi (India)
`hu-HU`	Hungarian (Hungary)
`it-IT`	Italian (Italy)
`ko-KR`	Korean (Korea)
`ms-MY`	Malay (Malaysia)
`nb-NO`	Norwegian Bokmål (Norway)
`nl-BE`	Dutch (Belgium)
`nl-NL`	Dutch (Netherlands)
`pl-PL`	Polish (Poland)
`pt-BR`	Portuguese (Brazil)
`pt-PT`	Portuguese (Portugal)
`ro-RO`	Romanian (Romania)
`ru-RU`	Russian (Russia)
`sk-SK`	Slovak (Slovakia)
`sv-SE`	Swedish (Sweden)
`th-TH`	Thai (Thailand)
`tr-TR`	Turkish (Türkiye)
`vi-VN`	Vietnamese (Vietnam)
`zh-HK`	Chinese (Cantonese, Traditional)
`zh-TW`	Chinese (Taiwanese Mandarin, Traditional)

See the language list for Custom neural voice for the full list of supported locales.

September 2024 release

Prebuilt neural voice

Added support and general availability for new voices in the following locales:

Locale (BCP-47)	Language	Text to speech voices
`as-IN`	Assamese (India)	`as-IN-YashicaNeural` (Female) `as-IN-PriyomNeural` (Male)
`or-IN`	Oriya (India)	`or-IN-SubhasiniNeural` (Female) `or-IN-SukantNeural` (Male)
`pa-IN`	Punjabi (India)	`pa-IN-OjasNeural` (Male) `pa-IN-VaaniNeural` (Female)

The one voice in this table is generally available and supports only the 'en-IN' locale.

Locale (BCP-47)	Language	Text to speech voices
`en-IN`	English (India)	`en-IN-AashiNeural` (Female)

The five voices in this table are generally available and support both "en-IN" and "hi-IN" locales.

Locale (BCP-47)	Language	Text to speech voices
`en-IN`	English (India)	`en-IN-AaravNeural` (Male) `en-IN-AnanyaNeural` (Female) `en-IN-KavyaNeural` (Female) `en-IN-KunalNeural` (Male) `en-IN-RehaanNeural` (Male)
`hi-IN`	Hindi (India)	`hi-IN-AaravNeural` (Male) `hi-IN-AnanyaNeural` (Female) `hi-IN-KavyaNeural` (Female) `hi-IN-KunalNeural` (Male) `hi-IN-RehaanNeural` (Male)

Voice styles and roles

Added newscast, cheerful, empathetic styles support for the en-IN-NeerjaNeural and hi-IN-SwaraNeural voices.

Added new styles for the following voices:

es-MX-DaliaNeural: whispering, sad, cheerful
fr-FR-DeniseNeural: whispering, sad, excited
it-IT-IsabellaNeural: whispering, sad, excited, cheerful
pt-PT-RaquelNeural: whispering, sad
de-DE-ConradNeural: sad, cheerful
en-GB-RyanNeural: whispering, sad
es-MX-JorgeNeural: whispering, sad, excited, cheerful
fr-FR-HenriNeural: whispering, sad, excited
it-IT-DiegoNeural: sad, excited, cheerful
es-ES-AlvaroNeural: cheerful, sad
ko-KR-InjoonNeural: sad

See the Voice styles and roles for more information.

August 2024 release

Prebuilt neural voice

Introduce new multilingual voices in public preview. See the full language and voice list for more information.

Brand new multilingual voices

Locale	Language	Gender	Voice name
en-US	English (United States)	Male	en-US-AdamMultilingualNeural
en-US	English (United States)	Female	en-US-AmandaMultilingualNeural
en-US	English (United States)	Male	en-US-DerekMultilingualNeural
en-US	English (United States)	Male	en-US-LewisMultilingualNeural
en-US	English (United States)	Female	en-US-LolaMultilingualNeural
en-US	English (United States)	Female	en-US-PhoebeMultilingualNeural
en-US	English (United States)	Male	en-US-SamuelMultilingualNeural
en-US	English (United States)	Female	en-US-SerenaMultilingualNeural
en-US	English (United States)	Male	en-US-DustinMultilingualNeural
en-US	English (United States)	Female	en-US-EvelynMultilingualNeural
es-ES	Spanish (Spain)	Male	es-ES-TristanMultilingualNeural
fr-FR	French (France)	Male	fr-FR-LucienMultilingualNeural
pt-BR	Portuguese (Brazil)	Male	pt-BR-MacerioMultilingualNeural
zh-CN	Chinese (Mandarin, Simplified)	Male	zh-CN-YunfanMultilingualNeural
zh-CN	Chinese (Mandarin, Simplified)	Male	zh-CN-YunxiaoMultilingualNeural
zh-CN	Chinese (Mandarin, Simplified)	Male	zh-CN-YunyiMultilingualNeural

Monolingual models updated to multilingual voices with improvements in naturalness

Locale	Language	Gender	Voice name
en-US	English (United States)	Female	en-US-NancyMultilingualNeural
en-US	English (United States)	Male	en-US-BrandonMultilingualNeural
en-US	English (United States)	Male	en-US-ChristopherMultilingualNeural
en-US	English (United States)	Female	en-US-CoraMultilingualNeural
en-US	English (United States)	Male	en-US-DavisMultilingualNeural
en-US	English (United States)	Male	en-US-SteffanMultilingualNeural
es-ES	Spanish (Spain)	Female	es-ES-XimenaMultilingualNeural
it-IT	Italian (Italy)	Male	it-IT-GiuseppeMultilingualNeural
ko-KR	Korean (Korea)	Male	ko-KR-HyunsuMultilingualNeural

Enhance the following current multilingual voices with better quality.

Locale Language Gender Voice name

en-US English (United States) Male en-US-AndrewMultilingualNeural

en-US English (United States) Female en-US-AvaMultilingualNeural
Three multilingual voices now support styles. See the Voice styles and roles for more information.
- en-US-SerenaMultilingualNeural: empathetic, excited, friendly, shy, serious, relieved, and sad.
- en-US-AndrewMultilingualNeural: empathetic and relieved.
- zh-CN-XiaoxiaoMultilingualNeural: affectionate, cheerful, empathetic, excited, poetry-reading, sorry, and story.

Locale	Language	Gender	Voice name
en-US	English (United States)	Male	en-US-AndrewMultilingualNeural
en-US	English (United States)	Female	en-US-AvaMultilingualNeural

July 2024 release

Text to speech avatar (GA)

Text to speech avatar is now generally available. For more information, see text to speech avatar.

Prebuilt neural voice

Introduce 2 turbo version of Azure OpenAI voices in public preview: en-US-AlloyTurboMultilingualNeural and en-US-NovaTurboMultilingualNeural. Turbo version of Azure OpenAI voices has the similar voice persona as Azure OpenAI voices but supports extra features. Turbo voices support the full set of SSML elements and more features like word boundary, just like other Azure AI Speech voices. See the full language and voice list for more information.
Introduce 2 new multilingual voices in public preview: zh-CN-YunfanMultilingualNeural and zh-CN-YunxiaoMultilingualNeural. See the full language and voice list for more information.

Embedded neural voice

en-US-JennyMultilingual voice is released in production, supporting up to 24 locales for on-device experience. For the supported locales, see the table below.

Locale	Language
`da-DK`	Danish (Denmark)
`de-DE`	German (Germany)
`en-AU`	English (Australia)
`en-GB`	English (United Kingdom)
`en-IN`	English (India)
`en-US`	English (United States)
`es-ES`	Spanish (Spain)
`es-MX`	Spanish (Mexico)
`fr-CA`	French (Canada)
`fr-FR`	French (France)
`he-IL`	Hebrew (Israel)
`it-IT`	Italian (Italy)
`ja-JP`	Japanese (Japan)
`ko-KR`	Korean (Korea)
`nb-NO`	Norwegian Bokmål (Norway)
`nl-NL`	Dutch (Netherlands)
`pl-PL`	Polish (Poland)
`pt-PT`	Portuguese (Portugal)
`sv-SE`	Swedish (Sweden)
`th-TH`	Thai (Thailand)
`tr-TR`	Turkish (Turkey)
`zh-CN`	Chinese (Mandarin, Simplified)
`zh-HK`	Chinese (Cantonese, Traditional)
`zh-TW`	Chinese (Taiwanese Mandarin, Traditional)

June 2024 release

Prebuilt neural voice

Introducing 6 new voices in public preview available in specific regions: East Asia, Southeast Asia, East US, West US, and Central India.

Locale	Language	Text to speech voices
`or-IN`	Oriya (India)	`or-IN-SubhasiniNeural` (Female)
`or-IN`	Oriya (India)	`or-IN-SukantNeural` (Male)
`pa-IN`	Punjabi (India)	`pa-IN-VaaniNeural` (Female)
`pa-IN`	Punjabi (India)	`pa-IN-OjasNeural` (Male)
`as-IN`	Assamese (India)	`as-IN-YashicaNeural` (Female)
`as-IN`	Assamese (India)	`as-IN-PriyomNeural` (Male)

See the full language and voice list for more information.

Text to speech avatar

Text to speech avatar now supports the following regions: Southeast Asia, North Europe, West Europe, Sweden Central, South Central US, and West US 2. For more information, see Speech service regions.

May 2024 release

Personal voice (GA)

Personal voice is now generally available. With personal voice, you can get AI generated replication of your voice (or users of your application) in a few seconds. You provide a one-minute speech sample as the audio prompt, and then use it to generate speech in any of the more than 90 languages supported across more than 100 locales. For more information, see the personal voice overview.

Prebuilt neural voice

Introduce 8 new multilingual voices in public preview: en-GB-AdaMultilingualNeural, en-GB-OllieMultilingualNeural, es-ES-ArabellaMultilingualNeural, es-ES-IsidoraMultilingualNeural, it-IT-AlessioMultilingualNeural, it-IT-IsabellaMultilingualNeural, it-IT-MarcelloMultilingualNeural, and pt-BR-ThalitaMultilingualNeural. See the full language and voice list for more information.
Introduce 2 new en-US voices optimized for Call Center scenario in public preview: en-US-LunaNeural and en-US-KaiNeural. See the full language and voice list for more information.

April 2024 release

Text to speech avatar

You can now set a static background image for your avatars. To utilize this feature, simply use the avatarConfig.backgroundImage property and specify a URL pointing to the desired image. For detials, refer to How to edit the background.

March 2024 release

Prebuilt neural voice

9 multilingual voices are generally available in all regions: en-US-AvaMultilingualNeural, en-US-AndrewMultilingualNeural, en-US-EmmaMultilingualNeural, en-US-BrianMultilingualNeural, de-DE-FlorianMultilingualNeural, de-DE-SeraphinaMultilingualNeural, fr-FR-RemyMultilingualNeural, fr-FR-VivienneMultilingualNeural, and zh-CN-XiaoxiaoMultilingualNeural. See the full language and voice list for more information.
Introducing a new multilingual voice for public preview: ja-JP-MasaruMultilingualNeural. See the full language and voice list for more information.
Additional updates:
- en-US-RyanMultilingualNeural is generally available in all regions.
- en-US-JennyMultilingualV2Neural is generally available in all regions, merged with en-US-JennyMultilingualNeural.
- Preview available for the updated en-IN-NeerjaNeural and hi-IN-SwaraNeural with 3 new styles in East US, West Europe, and Southeast Asia.
- Preview available for new female voices in Central India: en-IN-KavyaNeural, en-IN-AnanyaNeural, en-IN-AashiNeural, hi-IN-KavyaNeural, and hi-IN-AnanyaNeural.

Text to speech avatar

Removed dependency on Azure Communication Services (ACS) TURN for real-time avatar. The sample code has been updated accordingly to reflect this change.
Published text to speech avatar pricing. For more details, see the pricing page. Note that avatar pricing will only be visible for service regions where the feature is available.

February 2024 release

OpenAI voices

The Azure AI Speech service supports OpenAI text to speech voices in the following regions: North Central US and Sweden Central. Like Azure AI Speech voices, OpenAI text to speech voices deliver high-quality speech synthesis to convert written text into natural sounding spoken audio. This unlocks a wide range of possibilities for immersive and interactive user experiences. For more information, see What are OpenAI text to speech voices?.

Note

OpenAI text to speech voices are also available in Azure OpenAI Service.
With this update, we have adjusted the pricing of prebuilt neural voices with Azure AI Speech. Check the updated pricing here.

Personal voice

The personal voice feature now supports DragonLatestNeural and PhoenixLatestNeural models. These new models enhance the naturalness of synthesized voices, better resembling the speech characteristics of the voice in the prompt. For more details, refer to Integrate personal voice in your application.

December 2023 release

Custom voice API

The custom voice API is available for creating and managing professional and personal custom neural voice models.

Custom neural voice

The newly trained voice models now support 48 kHz sample rate, irrespective of the model version. For previously trained voice models, it's necessary to upgrade the engine version to at least 2023.11.13.0 version to enhance the sample rate to 48 kHz.

Prebuilt neural voice

Introducing new multilingual voices for public preview:

Locale (BCP-47)	Language	Text to speech voices
`de-DE`	German (Germany)	`de-DE-FlorianMultilingualNeural` (Male)
`de-DE`	German (Germany)	`de-DE-SeraphinaMultilingualNeural` (Female)
`en-US`	English (United States)	`en-US-AvaMultilingualNeural` (Female)
`en-US`	English (United States)	`en-US-EmmaMultilingualNeural` (Female)
`fr-FR`	French (France)	`fr-FR-RemyMultilingualNeural` (Male)
`en-US`	English (United States)	`en-US-BrianMultilingualNeural` (Male)
`en-US`	English (United States)	`en-US-AndrewMultilingualNeural` (Male)
`fr-FR`	French (France)	`fr-FR-VivienneMultilingualNeural` (Female)
`zh-CN`	Chinese (Mandarin, Simplified)	`zh-CN-XiaoxiaoMultilingualNeural` (Female)
`zh-CN`	Chinese (Mandarin, Simplified)	`zh-CN-XiaochenMultilingualNeural` (Female)
`zh-CN`	Chinese (Mandarin, Simplified)	`zh-CN-YunyiMultilingualNeural` (Male)

Introducing new zh-CN-XiaoxiaoDialectsNeural voices for public preview which support several Chinese dialects and accents:

Voicename	Secondary language	Dialect/Accent
`zh-CN-XiaoxiaoDialectsNeural`	`zh-CN-shaanxi`	Chinese (Zhongyuan Mandarin Shaanxi, Simplified)
	`zh-CN-sichuan`	Chinese (Southwestern Mandarin, Simplified)
	`zh-CN-shanxi`	Chinese (Shanxi Accent Mandarin, Simplified)
	`nan-CN`	Chinese (Southern Min, Simplified)
	`zh-CN-anhui`	Chinese (Jianghuai Mandarin Anhui, Simplified)
	`zh-CN-hunan`	Chinese (Hunan Accent Mandarin, Simplified)
	`zh-CN-gansu`	Chinese (Lanyin Mandarin Gansu, Simplified)
	`zh-CN-shandong`	Chinese (Jilu Mandarin, Simplified)
	`zh-CN-henan`	Chinese (Zhongyuan Mandarin Henan, Simplified)
	`zh-CN-liaoning`	Chinese (Northeastern Mandarin, Simplified)
	`zh-TW`	Chinese (Taiwanese Mandarin, Traditional)

November 2023 release

Personal voice

Personal voice is available in preview in the following regions: West Europe, East US, and South East Asia. With personal voice (preview), you can get AI generated replication of your voice (or users of your application) in a few seconds. You provide a one-minute speech sample as the audio prompt, and then use it to generate speech in any of the more than 90 languages supported across more than 100 locales.

For more information, see personal voice.

Text to speech avatar

Text to speech avatar is available in preview in the following regions: West US 2, West Europe, and Southeast Asia.

Text to speech avatar converts text into a digital video of a photorealistic human (either a prebuilt avatar or a custom text to speech avatar) speaking with a natural-sounding voice. The text to speech avatar video can be synthesized asynchronously or in real time. Developers can build applications integrated with text to speech avatar through an API, or use a content creation tool on Speech Studio to create video content without coding.

For more information, see text to speech avatar, transparency notes, and disclosure for voice and avatar talent.

Custom neural voice

Added support for the 24 new locales for cross-lingual voice. See the full language list for more information.

Prebuilt neural voice

Introducing new voices for public preview:

Locale (BCP-47)	Language	Text to speech voices
`de-DE`	German (Germany)	`SeraphinaNeural` (Female)
`es-ES`	Spanish (Spain)	`XimenaNeural` (Female)
`fr-CA`	French (Canada)	`ThierryNeural` (Male)
`fr-FR`	French (France)	`VivienneNeural` (Female)
`it-IT`	Italian (Italy)	`GiuseppeNeural` (Male)
`ko-KR`	Korean (Korea)	`HyunsuNeural` (Male)
`pt-BR`	Portuguese (Brazil)	`ThalitaNeural` (Female)

Models updated with bugs fixed and quality improvement:

Locale (BCP-47)	Language	Text to speech voices
`es-ES`	Spanish (Spain)	`AlvaroNeural` (Male)
`en-GB`	English (United Kingdom)	`RyanNeural` (Male)
`ko-KR`	Korean (Korea)	`InjoonNeural` (Male)

See the full language and voice list for more information.

October 2023 release

Custom neural voice

Added support for the 12 new locales with custom neural voice Pro. See the full language list for more information.

September 2023 release

Prebuilt neural voice

Introducing new voices for public preview:

Locale (BCP-47)	Language	Text to speech voices
`en-US`	English (United States)	`en-US-EmmaNeural` (Female)
`en-US`	English (United States)	`en-US-AndrewNeural` (Male)
`en-US`	English (United States)	`en-US-BrianNeural` (Male)

See the full language and voice list for more information.

Embedded neural voice

All 147 locales here (except fa-IR, Persian (Iran)) are available out of box with either 1 selected female and/or 1 selected male voices.

August 2023 release

Custom neural voice

The latest CNV Lite training recipe version has been released now. This release brings several enhancements on the quality of your language models. Try out Speech Studio.

July 2023 release

Custom neural voice

Multi-style voice is generally available.
Added two new locales in public preview for multi-style voice: ja-JP and zh-CN. See the full language and voice list for more information. Refer to the preset style list for different languages.
Cross-lingual voice is generally available.
Added two new locales for cross-lingual voice: id-ID and nl-NL. See the full language and voice list for more information.

Prebuilt Neural TTS Voices

Introducing new en-US gender neutral voice for public preview:

Locale (BCP-47)	Language	Text to speech voices
`en-US`	English (United States)	`en-US-BlueNeural` (Neutral)

Introducing new multilingual voices for public preview:

Locale (BCP-47)	Language	Text to speech voices
`en-US`	English (United States)	`en-US-JennyMultilingualV2Neural` (Female)
`en-US`	English (United States)	`en-US-RyanMultilingualNeural` (Male)

The multilingual voices en-US-JennyMultilingualV2Neural and en-US-RyanMultilingualNeural auto-detect the language of the input text. However, you can still use the <lang> element to adjust the speaking language for these voices.

These new multilingual voices can speak in 41 languages and accents: Arabic (Egypt), Arabic (Saudi Arabia), Catalan, Czech (Czechia), Danish (Denmark), German (Austria), German (Switzerland), German (Germany), English (Australia), English (Canada), English (United Kingdom), English (Hong Kong SAR), English (Ireland), English (India), English (United States), Spanish (Spain), Spanish (Mexico), Finnish (Finland), French (Belgium), French (Canada), French (Switzerland), French (France), Hindi (India), Hungarian (Hungary), Indonesian (Indonesia), Italian (Italy), Japanese (Japan), Korean (Korea), Norwegian Bokmål (Norway), Dutch (Belgium), Dutch (Netherlands), Polish (Poland), Portuguese (Brazil), Portuguese (Portugal), Russian (Russia), Swedish (Sweden), Thai (Thailand), Turkish (Türkiye), Chinese (Mandarin, Simplified), Chinese (Cantonese, Traditional), Chinese (Taiwanese Mandarin, Traditional).

These multilingual voices don't fully support certain SSML elements, such as break, emphasis, silence, and sub.

Important

The en-US-JennyMultilingualV2Neural voice is provided temporarily in public preview soley for evaluation purposes. It will be removed in the future.

In order to speak in a language other than English, the current implementation of the en-US-JennyMultilingualNeural voice requires that you set the <lang xml:lang> element. We anticipate that during Q4 calendar year 2023, the en-US-JennyMultilingualNeural voice will be updated to speak in the language of the input text without the <lang xml:lang> element. This will be in parity with the en-US-JennyMultilingualV2Neural voice.

Introducing new features in public preview for below voices:

Added Latin input for Serbian (Serbia) sr-RS voices: sr-latn-RS-SophieNeural and sr-latn-RS-NicholasNeural.
Added English pronunciation support for Albanian (Albania) sq-AL voices: sq-AL-AnilaNeural and sq-AL-IlirNeural.

May 2023 release

Audio Content Creation

All prebuilt voices with speaking styles and multi-style custom voices support style degree adjustment.
Now you can fix the pronunciation of a word by speaking the word and recording it. The phonemes can be automatically recognized from your recording. The Recognize by speaking feature is now in public preview.

April 2023 release

Prebuilt Neural TTS Voices

The following features of these voices moved from public preview to GA:

Style	Text to speech voices
style="chat"	`en-GB-RyanNeural`, `es-MX-JorgeNeural`, and `it-IT-IsabellaNeural`
style="cheerful"	`en-GB-RyanNeural`, `en-GB-SoniaNeural`, `es-MX-JorgeNeural`, `fr-FR-DeniseNeural`, `fr-FR-HenriNeural`, and `it-IT-IsabellaNeural`
style="sad"	`en-GB-SoniaNeural`, `fr-FR-DeniseNeural` and `fr-FR-HenriNeural`

Improve the English pronunciation for hi-IN, ta-IN and te-IN voices, now is flighting in public preview regions

For more information, see the language and voice list.

March 2023 release

New features

Speech Synthesis Markup Language (SSML) is updated to support audio effect processor elements that optimize the quality of the synthesized speech output for specific scenarios on devices. Learn more at speech synthesis markup.

Custom neural voice

Added support for the nl-BE locale with Custom neural voice Pro. See the full language and voice list for more information.

Prebuilt Neural TTS Voices

The following voices are now generally available. See the full language and voice list for more information.

Locale (BCP-47)	Language	Text to speech voices
`en-AU`	English (Australia)	`en-AU-AnnetteNeural` (Female) `en-AU-CarlyNeural` (Female) `en-AU-DarrenNeural` (Male) `en-AU-DuncanNeural` (Male) `en-AU-ElsieNeural` (Female) `en-AU-FreyaNeural` (Female) `en-AU-JoanneNeural` (Female) `en-AU-KenNeural` (Male) `en-AU-KimNeural` (Female) `en-AU-NeilNeural` (Male) `en-AU-TimNeural` (Male) `en-AU-TinaNeural` (Female) `en-AU-WilliamNeural` (Male)
`en-GB`	English (United Kingdom)	`en-GB-RyanNeural` (Male) `en-GB-SoniaNeural` (Female)
`es-ES`	Spanish (Spain)	`es-ES-AbrilNeural` (Female) `es-ES-ArnauNeural` (Male) `es-ES-DarioNeural` (Male) `es-ES-EliasNeural` (Male) `es-ES-EstrellaNeural` (Female) `es-ES-IreneNeural` (Female) `es-ES-LaiaNeural` (Female) `es-ES-LiaNeural` (Female) `es-ES-NilNeural` (Male) `es-ES-SaulNeural` (Male) `es-ES-TeoNeural` (Male) `es-ES-TrianaNeural` (Female) `es-ES-VeraNeural` (Female)
`es-MX`	Spanish (Mexico)	`es-MX-JorgeNeural` (Male)
`fr-FR`	French (France)	`fr-FR-HenriNeural` (Male)
`it-IT`	Italian (Italy)	`it-IT-IsabellaNeural` (Female)
`ja-JP`	Japanese (Japan)	`ja-JP-AoiNeural` (Female) `ja-JP-DaichiNeural` (Male) `ja-JP-MayuNeural` (Female) `ja-JP-NaokiNeural` (Male) `ja-JP-ShioriNeural` (Female)

Added support for the cheerful style with the de-DE-ConradNeural voice.

February 2023 release

Prebuilt Neural TTS Voices

The following voices are now generally available. See the full language and voice list for more information.

Locale (BCP-47)	Language	Text to speech voices
`zh-CN`	Chinese (Mandarin, Simplified)	`zh-CN-XiaomengNeural` (Female) `zh-CN-XiaoyiNeural` (Female) `zh-CN-XiaozhenNeural` (Female) `zh-CN-YunfengNeural` (Male) `zh-CN-YunhaoNeural` (Male) `zh-CN-YunjianNeural` (Male) `zh-CN-YunxiaNeural` (Male) `zh-CN-YunzeNeural` (Male)
`zh-CN-henan`	Chinese (Zhongyuan Mandarin Henan, Simplified)	`zh-CN-henan-YundengNeural` (Male)

December 2022 release

Batch synthesis REST API (Preview)

The Batch synthesis API is currently in public preview. Once it's generally available, the Long Audio API is deprecated. For more information, see Migrate to batch synthesis API.

November 2022 release

Prebuilt Neural TTS Voices (GA)

The following voices are now generally available. See the full language and voice list for more information.

Locale (BCP-47)	Language	Text to speech voices
`es-MX`	Spanish (Mexico)	`es-MX-BeatrizNeural` (Female) `es-MX-CandelaNeural` (Female) `es-MX-CarlotaNeural` (Female) `es-MX-CecilioNeural` (Male) `es-MX-GerardoNeural` (Male) `es-MX-LarissaNeural` (Female) `es-MX-LibertoNeural` (Male) `es-MX-LucianoNeural` (Male) `es-MX-MarinaNeural` (Female) `es-MX-NuriaNeural` (Female) `es-MX-PelayoNeural` (Male) `es-MX-RenataNeural` (Female) `es-MX-YagoNeural` (Male)
`it-IT`	Italian (Italy)	`it-IT-BenignoNeural` (Male) `it-IT-CalimeroNeural` (Male) `it-IT-CataldoNeural` (Male) `it-IT-FabiolaNeural` (Female) `it-IT-FiammaNeural` (Female) `it-IT-GianniNeural` (Male) `it-IT-ImeldaNeural` (Female) `it-IT-IrmaNeural` (Female) `it-IT-LisandroNeural` (Male) `it-IT-PalmiraNeural` (Female) `it-IT-PierinaNeural` (Female) `it-IT-RinaldoNeural` (Male)
`pt-BR`	Portuguese (Brazil)	`pt-BR-BrendaNeural` (Female) `pt-BR-DonatoNeural` (Male) `pt-BR-ElzaNeural` (Female) `pt-BR-FabioNeural` (Male) `pt-BR-GiovannaNeural` (Female) `pt-BR-HumbertoNeural` (Male) `pt-BR-JulioNeural` (Male) `pt-BR-LeilaNeural` (Female) `pt-BR-LeticiaNeural` (Female) `pt-BR-ManuelaNeural` (Female) `pt-BR-NicolauNeural` (Male) `pt-BR-ValerioNeural` (Male) `pt-BR-YaraNeural` (Female)

Custom neural voice

The following locale support is added for Custom neural voice. See the full language and voice list for more information.

Added support for the fr-BE locale with custom neural voice Pro.
Added support for the es-ES locale with custom neural voice lite.

October 2022 release

Prebuilt Neural TTS Voices (GA)

The following voices are now generally available. See the full language and voice list for more information.

Locale (BCP-47)	Language	Text to speech voices
`eu-ES`	Basque	`eu-ES-AinhoaNeural` (Female) `eu-ES-AnderNeural` (Male)
`hy-AM`	Armenian (Armenia)	`hy-AM-AnahitNeural` (Female) `hy-AM-HaykNeural` (Male)

Prebuilt Neural TTS Voices (Preview)

The following voices are now available in public preview. See the full language and voice list for more information.

Locale (BCP-47)	Language	Text to speech voices
`en-AU`	English (Australia)	`en-AU-AnnetteNeural`(Female) `en-AU-CarlyNeural`(Female) `en-AU-DarrenNeural`(Male) `en-AU-DuncanNeural`(Male) `en-AU-ElsieNeural`(Female) `en-AU-FreyaNeural`(Female) `en-AU-JoanneNeural`(Female) `en-AU-KenNeural`(Male) `en-AU-KimNeural`(Female) `en-AU-NeilNeural`(Male) `en-AU-TimNeural`(Male) `en-AU-TinaNeural`(Female)
`es-ES`	Spanish (Spain)	`es-ES-AbrilNeural`(Female) `es-ES-AlvaroNeural`(Male) `es-ES-ArnauNeural`(Male) `es-ES-DarioNeural`(Male) `es-ES-EliasNeural`(Male) `es-ES-EstrellaNeural`(Female) `es-ES-IreneNeural`(Female) `es-ES-LaiaNeural`(Female) `es-ES-LiaNeural`(Female) `es-ES-NilNeural`(Male) `es-ES-SaulNeural`(Male) `es-ES-TeoNeural`(Male) `es-ES-TrianaNeural`(Female) `es-ES-VeraNeural`(Female)
`ja-JP`	Japanese (Japan)	`ja-JP-AoiNeural`(Female) `ja-JP-DaichiNeural`(Male) `ja-JP-MayuNeural`(Female) `ja-JP-NaokiNeural`(Male) `ja-JP-ShioriNeural`(Female)
`ko-KR`	Korean (Korea)	`ko-KR-BongJinNeural`(Male) `ko-KR-GookMinNeural`(Male) `ko-KR-JiMinNeural`(Female) `ko-KR-SeoHyeonNeural`(Female) `ko-KR-SoonBokNeural`(Female) `ko-KR-YuJinNeural`(Female)
`wuu-CN`	Chinese (Wu, Simplified)	`wuu-CN-XiaotongNeural` (Female) `wuu-CN-YunzheNeural` (Male)
`yue-CN`	Chinese (Cantonese, Simplified)	`yue-CN-XiaoMinNeural` (Female) `yue-CN-YunSongNeural` (Male)

General TTS voice updates

Improved quality for the fil-PH-AngeloNeural and fil-PH-BlessicaNeural voices.
Text Normalization rules are updated for voices with the es-CL Spanish (Chile) and uz-UZ Uzbek (Uzbekistan) locales.
Added English letters spelling for voices with the sq-AL Albanian (Albania) and az-AZ Azerbaijani (Azerbaijan) locales.
Improved English pronunciation for the zh-HK-WanLungNeural voice.
Improved question tone for the nl-NL-MaartenNeural and pt-BR-AntonioNeural voices.
Added support for the <lang ="en-US"> tag for better English pronunciation with the following voices: de-DE-ConradNeural, de-DE-KatjaNeural, es-ES-AlvaroNeural, es-MX-DaliaNeural, es-MX-JorgeNeural, fr-CA-SylvieNeural, fr-FR-DeniseNeural, fr-FR-HenriNeural, it-IT-DiegoNeural, and it-IT-IsabellaNeural.
Added support for the style="chat" tag with the following voices: en-GB-RyanNeural, es-MX-JorgeNeural, and it-IT-IsabellaNeural.
Added support for the style="cheerful" tag with the following voices: en-GB-RyanNeural, en-GB-SoniaNeural, es-MX-JorgeNeural, fr-FR-DeniseNeural, fr-FR-HenriNeural, and it-IT-IsabellaNeural.
Added support for the style="sad" tag with the following voices: en-GB-SoniaNeural, fr-FR-DeniseNeural and fr-FR-HenriNeural.

September 2022 release

Prebuilt Neural TTS Voice

All the prebuilt neural voices have been upgraded to high-fidelity voices with 48kHz sample rate.

August 2022 release

Prebuilt Neural TTS Voice

Released new voices in public preview:

Voices for English (United States): en-US-AIGenerate1Neural and en-US-AIGenerate2Neural.
Voices for Chinese regional languages: zh-CN-henan-YundengNeural, zh-CN-shaanxi-XiaoniNeural, and zh-CN-shandong-YunxiangNeural.

For more information, see the language and voice list.

July 2022 release

Prebuilt Neural TTS Voice

Added 5 new voices of zh-CN Chinese (Mandarin, Simplified) and 1 new voice of en-US English (United States) in Public Preview. See full language and voice list.

Language	Locale	Gender	Voice name	Style support
Chinese (Mandarin, Simplified)	`zh-CN`	Female	`zh-CN-XiaomengNeural` ^New	General, multiple styles available using SSML
Chinese (Mandarin, Simplified)	`zh-CN`	Female	`zh-CN-XiaoyiNeural` ^New	General, multiple styles available using SSML
Chinese (Mandarin, Simplified)	`zh-CN`	Female	`zh-CN-XiaozhenNeural` ^New	General, multiple styles available using SSML
Chinese (Mandarin, Simplified)	`zh-CN`	Male	`zh-CN-YunxiaNeural` ^New	General, multiple styles available using SSML
Chinese (Mandarin, Simplified)	`zh-CN`	Male	`zh-CN-YunzeNeural` ^New	General, multiple styles available using SSML
English (United States)	`en-US`	Male	`en-US-RogerNeural` ^New	General

Supported styles and roles for the added neural voices.

Voice	Styles	Style degree	Roles
zh-CN-XiaomengNeural ^{Public preview}	`chat`	Supported
zh-CN-XiaoyiNeural ^{Public preview}	`affectionate`, `angry`, `cheerful`, `disgruntled`, `embarrassed`, `fearful`, `gentle`, `sad`, `serious`	Supported
zh-CN-XiaozhenNeural ^{Public preview}	`angry`, `cheerful`, `disgruntled`, `fearful`, `sad`, `serious`	Supported
zh-CN-YunxiaNeural ^{Public preview}	`angry`, `calm`, `cheerful`, `fearful`, `sad`	Supported
zh-CN-YunzeNeural ^{Public preview}	`angry`, `calm`, `cheerful`, `depressed`, `disgruntled`, `documentary-narration`, `fearful`, `sad`, `serious`	Supported	Supported

Get facial position with viseme

Added support for blend shapes to drive the facial movements of a 3D character that you designed. Learn more at how to get facial position with viseme.
SSML updated to support viseme element. See speech synthesis markup.

June 2022 release

Prebuilt Neural TTS Voice

Added 9 new languages and variants for Neural text to speech:

Language	Locale	Gender	Voice name	Style support
Arabic (Lebanon)	`ar-LB`	Female	`ar-LB-LaylaNeural` ^New	General
Arabic (Lebanon)	`ar-LB`	Male	`ar-LB-RamiNeural` ^New	General
Arabic (Oman)	`ar-OM`	Female	`ar-OM-AyshaNeural` ^New	General
Arabic (Oman)	`ar-OM`	Male	`ar-OM-AbdullahNeural` ^New	General
Azerbaijani (Azerbaijan)	`az-AZ`	Female	`az-AZ-BabekNeural` ^New	General
Azerbaijani (Azerbaijan)	`az-AZ`	Male	`az-AZ-BanuNeural` ^New	General
Bosnian (Bosnia and Herzegovina)	`bs-BA`	Female	`bs-BA-VesnaNeural` ^New	General
Bosnian (Bosnia and Herzegovina)	`bs-BA`	Male	`bs-BA-GoranNeural` ^New	General
Georgian (Georgia)	`ka-GE`	Female	`ka-GE-EkaNeural` ^New	General
Georgian (Georgia)	`ka-GE`	Male	`ka-GE-GiorgiNeural` ^New	General
Mongolian (Mongolia)	`mn-MN`	Female	`mn-MN-YesuiNeural` ^New	General
Mongolian (Mongolia)	`mn-MN`	Male	`mn-MN-BataaNeural` ^New	General
Nepali (Nepal)	`ne-NP`	Female	`ne-NP-HemkalaNeural` ^New	General
Nepali (Nepal)	`ne-NP`	Male	`ne-NP-SagarNeural` ^New	General
Albanian (Albania)	`sq-AL`	Female	`sq-AL-AnilaNeural` ^New	General
Albanian (Albania)	`sq-AL`	Male	`sq-AL-IlirNeural` ^New	General
Tamil (Malaysia)	`ta-MY`	Female	`ta-MY-KaniNeural` ^New	General
Tamil (Malaysia)	`ta-MY`	Male	`ta-MY-SuryaNeural` ^New	General

GA 36 voices from Public Preview for en-GB English (United Kingdom), fr-FR French (France) and de-DE German (Germany):

Language	Locale	Gender	Voice name	Style support
English (United Kingdom)	`en-GB`	Female	`en-GB-AbbiNeural`	General
English (United Kingdom)	`en-GB`	Female	`en-GB-BellaNeural`	General
English (United Kingdom)	`en-GB`	Female	`en-GB-HollieNeural`	General
English (United Kingdom)	`en-GB`	Female	`en-GB-MaisieNeural`	General, child voice
English (United Kingdom)	`en-GB`	Female	`en-GB-OliviaNeural`	General
English (United Kingdom)	`en-GB`	Female	`en-GB-SoniaNeural`	General
English (United Kingdom)	`en-GB`	Male	`en-GB-AlfieNeural`	General
English (United Kingdom)	`en-GB`	Male	`en-GB-ElliotNeural`	General
English (United Kingdom)	`en-GB`	Male	`en-GB-EthanNeural`	General
English (United Kingdom)	`en-GB`	Male	`en-GB-NoahNeural`	General
English (United Kingdom)	`en-GB`	Male	`en-GB-OliverNeural`	General
English (United Kingdom)	`en-GB`	Male	`en-GB-ThomasNeural`	General
French (France)	`fr-FR`	Female	`fr-FR-BrigitteNeural`	General
French (France)	`fr-FR`	Female	`fr-FR-CelesteNeural`	General
French (France)	`fr-FR`	Female	`fr-FR-CoralieNeural`	General
French (France)	`fr-FR`	Female	`fr-FR-EloiseNeural`	General, child voice
French (France)	`fr-FR`	Female	`fr-FR-JacquelineNeural`	General
French (France)	`fr-FR`	Female	`fr-FR-JosephineNeural`	General
French (France)	`fr-FR`	Female	`fr-FR-YvetteNeural`	General
French (France)	`fr-FR`	Male	`fr-FR-AlainNeural`	General
French (France)	`fr-FR`	Male	`fr-FR-ClaudeNeural`	General
French (France)	`fr-FR`	Male	`fr-FR-JeromeNeural`	General
French (France)	`fr-FR`	Male	`fr-FR-MauriceNeural`	General
French (France)	`fr-FR`	Male	`fr-FR-YvesNeural`	General
German (Germany)	`de-DE`	Female	`de-DE-AmalaNeural`	General
German (Germany)	`de-DE`	Female	`de-DE-ElkeNeural`	General
German (Germany)	`de-DE`	Female	`de-DE-GiselaNeural`	General, child voice
German (Germany)	`de-DE`	Female	`de-DE-KlarissaNeural`	General
German (Germany)	`de-DE`	Female	`de-DE-LouisaNeural`	General
German (Germany)	`de-DE`	Female	`de-DE-MajaNeural`	General
German (Germany)	`de-DE`	Female	`de-DE-TanjaNeural`	General
German (Germany)	`de-DE`	Male	`de-DE-BerndNeural`	General
German (Germany)	`de-DE`	Male	`de-DE-ChristophNeural`	General
German (Germany)	`de-DE`	Male	`de-DE-KasperNeural`	General
German (Germany)	`de-DE`	Male	`de-DE-KillianNeural`	General
German (Germany)	`de-DE`	Male	`de-DE-KlausNeural`	General
German (Germany)	`de-DE`	Male	`de-DE-RalfNeural`	General

Added 40 new voices of es-MX Spanish (Mexico), it-IT Italian (Italy), pt-BR Portuguese (Brazil) and 2 accents for zh-CN Chinese (Mandarin, Simplified) in Public Preview:

Language	Locale	Gender	Voice name	Style support
Spanish (Mexico)	`es-MX`	Female	`es-MX-BeatrizNeural` ^New	General
Spanish (Mexico)	`es-MX`	Female	`es-MX-CarlotaNeural` ^New	General
Spanish (Mexico)	`es-MX`	Female	`es-MX-NuriaNeural` ^New	General
Spanish (Mexico)	`es-MX`	Female	`es-MX-RenataNeural` ^New	General
Spanish (Mexico)	`es-MX`	Female	`es-MX-LarissaNeural` ^New	General
Spanish (Mexico)	`es-MX`	Female	`es-MX-CandelaNeural` ^New	General
Spanish (Mexico)	`es-MX`	Female	`es-MX-MarinaNeural` ^New	General
Italian (Italy)	`it-IT`	Female	`it-IT-FiammaNeural` ^New	General
Italian (Italy)	`it-IT`	Female	`it-IT-IrmaNeural` ^New	General
Italian (Italy)	`it-IT`	Female	`it-IT-FabiolaNeural` ^New	General
Italian (Italy)	`it-IT`	Female	`it-IT-PalmiraNeural` ^New	General
Italian (Italy)	`it-IT`	Female	`it-IT-ImeldaNeural` ^New	General
Italian (Italy)	`it-IT`	Female	`it-IT-PierinaNeural` ^New	General
Portuguese (Brazil)	`pt-BR`	Female	`pt-BR-ElzaNeural` ^New	General
Portuguese (Brazil)	`pt-BR`	Female	`pt-BR-ManuelaNeural` ^New	General
Portuguese (Brazil)	`pt-BR`	Female	`pt-BR-BrendaNeural` ^New	General
Portuguese (Brazil)	`pt-BR`	Female	`pt-BR-LeilaNeural` ^New	General
Portuguese (Brazil)	`pt-BR`	Female	`pt-BR-YaraNeural` ^New	General
Portuguese (Brazil)	`pt-BR`	Female	`pt-BR-GiovannaNeural` ^New	General
Portuguese (Brazil)	`pt-BR`	Female	`pt-BR-LeticiaNeural` ^New	General
Spanish (Mexico)	`es-MX`	Male	`es-MX-CecilioNeural` ^New	General
Spanish (Mexico)	`es-MX`	Male	`es-MX-LibertoNeural` ^New	General
Spanish (Mexico)	`es-MX`	Male	`es-MX-LucianoNeural` ^New	General
Spanish (Mexico)	`es-MX`	Male	`es-MX-PelayoNeural` ^New	General
Spanish (Mexico)	`es-MX`	Male	`es-MX-YagoNeural` ^New	General
Spanish (Mexico)	`es-MX`	Male	`es-MX-GerardoNeural` ^New	General
Italian (Italy)	`it-IT`	Male	`it-IT-BenignoNeural` ^New	General
Italian (Italy)	`it-IT`	Male	`it-IT-CataldoNeural` ^New	General
Italian (Italy)	`it-IT`	Male	`it-IT-LisandroNeural` ^New	General
Italian (Italy)	`it-IT`	Male	`it-IT-CalimeroNeural` ^New	General
Italian (Italy)	`it-IT`	Male	`it-IT-RinaldoNeural` ^New	General
Italian (Italy)	`it-IT`	Male	`it-IT-GianniNeural` ^New	General
Portuguese (Brazil)	`pt-BR`	Male	`pt-BR-DonatoNeural` ^New	General
Portuguese (Brazil)	`pt-BR`	Male	`pt-BR-HumbertoNeural` ^New	General
Portuguese (Brazil)	`pt-BR`	Male	`pt-BR-FabioNeural` ^New	General
Portuguese (Brazil)	`pt-BR`	Male	`pt-BR-JulioNeural` ^New	General
Portuguese (Brazil)	`pt-BR`	Male	`pt-BR-ValerioNeural` ^New	General
Portuguese (Brazil)	`pt-BR`	Male	`pt-BR-NicolauNeural` ^New	General
Chinese (Mandarin, Simplified)	`zh-CN-sichuan`	Male	`zh-CN-sichuan-YunxiSichuanNeural` ^New	General, Sichuan accent
Chinese (Mandarin, Simplified)	`zh-CN-liaoning`	Female	`zh-CN-liaoning-XiaobeiNeural` ^New	General, Liaoning accent

Improved quality for en-SG-LunaNeural and en-SG-WayneNeural
48kHz output support for Public Preview with en-US-JennyNeural, en-US-AriaNeural, and zh-CN-XiaoxiaoNeural

Custom neural voice

Enabled to fix data issues online. Learn more on how to resolve data issues in Speech Studio.
Added training recipe version. Learn more on selecting the training recipe version for your voice model.

Audio Content Creation tool

Supported pagination.
Enabled to sort globally by name, file type, and update time on work file page.

May 2022 release

Prebuilt Neural TTS Voice

Released 5 new voices in public preview with multiple styles to enrich the variety in American English. See full language and voice list.
Support these new styles Angry, Excited, Friendly, Hopeful, Sad, Shouting, Unfriendly, Terrified and Whispering in public preview for en-US-AriaNeural.
Support these new styles Angry, Cheerful, Excited, Friendly, Hopeful, Sad, Shouting, Unfriendly, Terrified and Whispering in public preview for en-US-GuyNeural, en-US-JennyNeural.
Support these new styles Excited, Friendly, Hopeful, Shouting, Unfriendly, Terrified and Whispering in public preview for en-US-SaraNeural. See voice styles and roles.
Released new voices zh-CN-YunjianNeural, zh-CN-YunhaoNeural, and zh-CN-YunfengNeural in public preview. See full language and voice list.
Support 2 new styles sports-commentary, sports-commentary-excited in public preview for zh-CN-YunjianNeural. See voice styles and roles.
Support 1 new style advertisement-upbeat in public preview for zh-CN-YunhaoNeural. See voice styles and roles.
The cheerful and sad styles for fr-FR-DeniseNeural are generally available in all regions.
SSML updated to support MathML elements for en-US and en-AU voices. Learn more at speech synthesis markup.

Custom neural voice

Enabled to cancel training during training voice model. Learn more on how to cancel training.
Enabled to clone model (rename voice model). Learn more on how to rename your voice model.
Enabled to test your voice model by adding your own test script. Learn more on how to upload your test script.
Enabled to update engine version for your voice model. Learn more on how to update the model engine version.
Supported more training regions. See region support.
Supported 10 locales for custom neural voice lite (preview). See language support.

Audio Content Creation tool

Enabled to try out Audio Content Creation tool without signing in.
Improved layout for adjusting phonemes.
Enhanced performance: Specified the maximum number (200) of files to be uploaded at one time.
Enhanced performance: Specified the maximum directory depth level (5 levels).

March 2022 release

Prebuilt Neural TTS Voice

Added support in public preview for the Cheerful and Sad styles with fr-FR-DeniseNeural. See voice styles and roles.
Released disconnected containers for prebuilt neural TTS voices in public preview. See use Docker containers in disconnected environments.

Custom neural voice

Supported role based access control. Learn more on Azure role-based access control in Speech Studio
Supported private endpoints and virtual network service endpoints. Learn more on how to use private endpoints with speech service.

Audio Content Creation tool

Updated the file size and concurrency limit for free-tier (F0) resources to make the experience consistent with the Speech SDK and APIs. See speech service quotas and limits.

February 2022 release

Custom neural voice

Released custom neural voice lite in public preview. Learn more about what is custom neural voice lite.
Extended language support to 49 locales. See language support.
Supported more regions/datacenters. See region support.

Audio Content Creation tool

Removed the output length limit for downloading audios.

January 2022 release

New languages and voices

Added 10 new languages and variants for Neural text to speech:

Language	Locale	Gender	Voice name	Style support
Bengali (India)	`bn-IN`	Female	`bn-IN-TanishaaNeural` ^New	General
Bengali (India)	`bn-IN`	Male	`bn-IN-BashkarNeural` ^New	General
Icelandic (Iceland)	`is-IS`	Female	`is-IS-GudrunNeural` ^New	General
Icelandic (Iceland)	`is-IS`	Male	`is-IS-GunnarNeural` ^New	General
Kannada (India)	`kn-IN`	Female	`kn-IN-SapnaNeural` ^New	General
Kannada (India)	`kn-IN`	Male	`kn-IN-GaganNeural` ^New	General
Kazakh (Kazakhstan)	`kk-KZ`	Female	`kk-KZ-AigulNeural` ^New	General
Kazakh (Kazakhstan)	`kk-KZ`	Male	`kk-KZ-DauletNeural` ^New	General
Lao (Laos)	`lo-LA`	Female	`lo-LA-KeomanyNeural` ^New	General
Lao (Laos)	`lo-LA`	Male	`lo-LA-ChanthavongNeural` ^New	General
Macedonian (Republic of North Macedonia)	`mk-MK`	Female	`mk-MK-MarijaNeural` ^New	General
Macedonian (Republic of North Macedonia)	`mk-MK`	Male	`mk-MK-AleksandarNeural` ^New	General
Malayalam (India)	`ml-IN`	Female	`ml-IN-SobhanaNeural` ^New	General
Malayalam (India)	`ml-IN`	Male	`ml-IN-MidhunNeural` ^New	General
Pashto (Afghanistan)	`ps-AF`	Female	`ps-AF-LatifaNeural` ^New	General
Pashto (Afghanistan)	`ps-AF`	Male	`ps-AF-GulNawazNeural` ^New	General
Serbian (Serbia, Cyrillic)	`sr-RS`	Female	`sr-RS-SophieNeural` ^New	General
Serbian (Serbia, Cyrillic)	`sr-RS`	Male	`sr-RS-NicholasNeural` ^New	General
Sinhala (Sri Lanka)	`si-LK`	Female	`si-LK-ThiliniNeural` ^New	General
Sinhala (Sri Lanka)	`si-LK`	Male	`si-LK-SameeraNeural` ^New	General

For the full list of available voices, see Language support.

New voices in preview

Added new voices for en-GB, fr-FR and de-DE in preview:

Language	Locale	Gender	Voice name	Style support
English (United Kingdom)	`en-GB`	Female	`en-GB-AbbiNeural` ^New	General
English (United Kingdom)	`en-GB`	Female	`en-GB-BellaNeural` ^New	General
English (United Kingdom)	`en-GB`	Female	`en-GB-HollieNeural` ^New	General
English (United Kingdom)	`en-GB`	Female	`en-GB-OliviaNeural` ^New	General
English (United Kingdom)	`en-GB`	Girl	`en-GB-MaisieNeural` ^New	General
English (United Kingdom)	`en-GB`	Male	`en-GB-AlfieNeural` ^New	General
English (United Kingdom)	`en-GB`	Male	`en-GB-ElliotNeural` ^New	General
English (United Kingdom)	`en-GB`	Male	`en-GB-EthanNeural` ^New	General
English (United Kingdom)	`en-GB`	Male	`en-GB-NoahNeural` ^New	General
English (United Kingdom)	`en-GB`	Male	`en-GB-OliverNeural` ^New	General
English (United Kingdom)	`en-GB`	Male	`en-GB-ThomasNeural` ^New	General
French (France)	`fr-FR`	Female	`fr-FR-BrigitteNeural` ^New	General
French (France)	`fr-FR`	Female	`fr-FR-CelesteNeural` ^New	General
French (France)	`fr-FR`	Female	`fr-FR-CoralieNeural` ^New	General
French (France)	`fr-FR`	Female	`fr-FR-JacquelineNeural` ^New	General
French (France)	`fr-FR`	Female	`fr-FR-JosephineNeural` ^New	General
French (France)	`fr-FR`	Female	`fr-FR-YvetteNeural` ^New	General
French (France)	`fr-FR`	Girl	`fr-FR-EloiseNeural` ^New	General
French (France)	`fr-FR`	Male	`fr-FR-AlainNeural` ^New	General
French (France)	`fr-FR`	Male	`fr-FR-ClaudeNeural` ^New	General
French (France)	`fr-FR`	Male	`fr-FR-JeromeNeural` ^New	General
French (France)	`fr-FR`	Male	`fr-FR-MauriceNeural` ^New	General
French (France)	`fr-FR`	Male	`fr-FR-YvesNeural` ^New	General
German (Germany)	`de-DE`	Female	`de-DE-AmalaNeural` ^New	General
German (Germany)	`de-DE`	Female	`de-DE-ElkeNeural` ^New	General
German (Germany)	`de-DE`	Female	`de-DE-KlarissaNeural` ^New	General
German (Germany)	`de-DE`	Female	`de-DE-LouisaNeural` ^New	General
German (Germany)	`de-DE`	Female	`de-DE-MajaNeural` ^New	General
German (Germany)	`de-DE`	Female	`de-DE-TanjaNeural` ^New	General
German (Germany)	`de-DE`	Girl	`de-DE-GiselaNeural` ^New	General
German (Germany)	`de-DE`	Male	`de-DE-BerndNeural` ^New	General
German (Germany)	`de-DE`	Male	`de-DE-ChristophNeural` ^New	General
German (Germany)	`de-DE`	Male	`de-DE-KasperNeural` ^New	General
German (Germany)	`de-DE`	Male	`de-DE-KillianNeural` ^New	General
German (Germany)	`de-DE`	Male	`de-DE-KlausNeural` ^New	General
German (Germany)	`de-DE`	Male	`de-DE-RalfNeural` ^New	General

For the full list of available voices, see Language support.

Pronunciation accuracy

Improved English word pronunciation for all he-IL voices.
Improved word-level pronunciation accuracy for cs-CZ and da-DK.
Improved Arabic diacritics and Hebrew Nikud handling.
Improved entity reading for ja-JP

Speech Studio

Custom neural voice: enabled additional model testing using the batch API (long audio API)
Audio Content Creation: enabled more output formats

October 2021 release

New languages and voices

Added 49 new languages and 98 voices for Neural text to speech:

Adri in af-ZA Afrikaans (South Africa), Willem in af-ZA Afrikaans (South Africa), Mekdes in am-ET Amharic (Ethiopia), Ameha in am-ET Amharic (Ethiopia), Fatima in ar-AE Arabic (United Arab Emirates), Hamdan in ar-AE Arabic (United Arab Emirates), Laila in ar-BH Arabic (Bahrain), Ali in ar-BH Arabic (Bahrain), Amina in ar-DZ Arabic (Algeria), Ismael in ar-DZ Arabic (Algeria), Rana in ar-IQ Arabic (Iraq), Bassel in ar-IQ Arabic (Iraq), Sana in ar-JO Arabic (Jordan), Taim in ar-JO Arabic (Jordan), Noura in ar-KW Arabic (Kuwait), Fahed in ar-KW Arabic (Kuwait), Iman in ar-LY Arabic (Libya), Omar in ar-LY Arabic (Libya), Mouna in ar-MA Arabic (Morocco), Jamal in ar-MA Arabic (Morocco), Amal in ar-QA Arabic (Qatar), Moaz in ar-QA Arabic (Qatar), Amany in ar-SY Arabic (Syria), Laith in ar-SY Arabic (Syria), Reem in ar-TN Arabic (Tunisia), Hedi in ar-TN Arabic (Tunisia), Maryam in ar-YE Arabic (Yemen), Saleh in ar-YE Arabic (Yemen), Nabanita in bn-BD Bangla (Bangladesh), Pradeep in bn-BD Bangla (Bangladesh), Asilia in en-KE English (Kenya), Chilemba in en-KE English (Kenya), Ezinne in en-NG English (Nigeria), Abeo in en-NG English (Nigeria), Imani in en-TZ English (Tanzania), Elimu in en-TZ English (Tanzania), Sofia in es-BO Spanish (Bolivia), Marcelo in es-BO Spanish (Bolivia), Catalina in es-CL Spanish (Chile), Lorenzo in es-CL Spanish (Chile), Maria in es-CR Spanish (Costa Rica), Juan in es-CR Spanish (Costa Rica), Belkys in es-CU Spanish (Cuba), Manuel in es-CU Spanish (Cuba), Ramona in es-DO Spanish (Dominican Republic), Emilio in es-DO Spanish (Dominican Republic), Andrea in es-EC Spanish (Ecuador), Luis in es-EC Spanish (Ecuador), Teresa in es-GQ Spanish (Equatorial Guinea), Javier in es-GQ Spanish (Equatorial Guinea), Marta in es-GT Spanish (Guatemala), Andres in es-GT Spanish (Guatemala), Karla in es-HN Spanish (Honduras), Carlos in es-HN Spanish (Honduras), Yolanda in es-NI Spanish (Nicaragua), Federico in es-NI Spanish (Nicaragua), Margarita in es-PA Spanish (Panama), Roberto in es-PA Spanish (Panama), Camila in es-PE Spanish (Peru), Alex in es-PE Spanish (Peru), Karina in es-PR Spanish (Puerto Rico), Victor in es-PR Spanish (Puerto Rico), Tania in es-PY Spanish (Paraguay), Mario in es-PY Spanish (Paraguay), Lorena in es-SV Spanish (El Salvador), Rodrigo in es-SV Spanish (El Salvador), Valentina in es-UY Spanish (Uruguay), Mateo in es-UY Spanish (Uruguay), Paola in es-VE Spanish (Venezuela), Sebastian in es-VE Spanish (Venezuela), Dilara in fa-IR Persian (Iran), Farid in fa-IR Persian (Iran), Blessica in fil-PH Filipino (Philippines), Angelo in fil-PH Filipino (Philippines), Sabela in gl-ES Galician, Roi in gl-ES Galician, Siti in jv-ID Javanese (Indonesia), Dimas in jv-ID Javanese (Indonesia), Sreymom in km-KH Khmer (Cambodia), Piseth in km-KH Khmer (Cambodia), Nilar in my-MM Burmese (Myanmar), Thiha in my-MM Burmese (Myanmar), Ubax in so-SO Somali (Somalia), Muuse in so-SO Somali (Somalia), Tuti in su-ID Sundanese (Indonesia), Jajang in su-ID Sundanese (Indonesia), Rehema in sw-TZ Swahili (Tanzania), Daudi in sw-TZ Swahili (Tanzania), Saranya in ta-LK Tamil (Sri Lanka), Kumar in ta-LK Tamil (Sri Lanka), Venba in ta-SG Tamil (Singapore), Anbu in ta-SG Tamil (Singapore), Gul in ur-IN Urdu (India), Salman in ur-IN Urdu (India), Madina in uz-UZ Uzbek (Uzbekistan), Sardor in uz-UZ Uzbek (Uzbekistan), Thando in zu-ZA Zulu (South Africa), Themba in zu-ZA Zulu (South Africa).

September 2021 release

New chatbot voice in en-US English (US): Sara, represents a young female adult that talks more casually and fits best for the chatbot scenarios.
New styles added for ja-JP Japanese voice Nanami: Three new styles are now available with Nanami: chat, customer service, and cheerful.
Overall pronunciation improvement: Ardi in id-ID, Premwadee in th-TH, Christel in da-DK, HoaiMy and NamMinh in vi-VN.
Two new voices in zh-CN Chinese (Mandarin, China) in preview: Xiaochen & Xiaoyan, optimized for spontaneous speech and customer service scenarios.

July 2021 release

Neural text to speech updates

Reduced pronunciation errors in Hebrew by 20%.

Speech Studio updates

Custom neural voice: Updated the training pipeline to UniTTSv3 with which the model quality is improved while training time is reduced by 50% for acoustic models.
Audio Content Creation: Fixed the "Export" performance issue and the bug on custom neural voice selection.

June 2021 release

Speech Studio updates

Custom neural voice: Custom neural voice training extended to support South East Asia. New features released to support data uploading status checking.
Audio Content Creation: Released a new feature to support custom lexicon. With this feature, users can easily create their lexicon files and define the customized pronunciation for their audio output.

May 2021 release

New languages and voices added for neural TTS

Ten new languages introduced - 20 new voices in 10 new locales are added into the neural TTS language list: Yan in en-HK English (Hongkong), Sam in en-HK English (Hongkong), Molly in en-NZ English (New Zealand), Mitchell in en-NZ English (New Zealand), Luna in en-SG English (Singapore), Wayne in en-SG English (Singapore), Leah in en-ZA English (South Africa), Luke in en-ZA English (South Africa), Dhwani in gu-IN Gujarati (India), Niranjan in gu-IN Gujarati (India), Aarohi in mr-IN Marathi (India), Manohar in mr-IN Marathi (India), Elena in es-AR Spanish (Argentina), Tomas in es-AR Spanish (Argentina), Salome in es-CO Spanish (Colombia), Gonzalo in es-CO Spanish (Colombia), Paloma in es-US Spanish (US), Alonso in es-US Spanish (US), Zuri in sw-KE Swahili (Kenya), Rafiki in sw-KE Swahili (Kenya).
Eleven new en-US voices in preview - 11 new en-US voices in preview are added to American English, they are Ashley, Amber, Ana, Brandon, Christopher, Cora, Elizabeth, Eric, Michelle, Monica, Jacob.
Five zh-CN Chinese (Mandarin, Simplified) voices are generally available - 5 Chinese (Mandarin, Simplified) voices are changed from preview to generally available. They are Yunxi, Xiaomo, Xiaoman, Xiaoxuan, Xiaorui. Now, these voices are available in all regions. Yunxi is added with a new 'assistant' style, which is suitable for chat bot and voice agent. Xiaomo's voice styles are refined to be more natural and featured.

April 2021 release

Neural text to speech is available across 21 regions

Twelve new regions added - Neural text to speech is now available in these new 12 regions: Japan East, Japan West, Korea Central, North Central US, North Europe, South Central US, Southeast Asia, UK South, west Central US, West Europe, West US, West US 2. Check here for full list of 21 supported regions.

March 2021 release

New languages and voices added for neural TTS

Six new languages introduced - 12 new voices in 6 new locales are added into the neural TTS language list: Nia in cy-GB Welsh (United Kingdom), Aled in cy-GB Welsh (United Kingdom), Rosa in en-PH English (Philippines), James in en-PH English (Philippines), Charline in fr-BE French (Belgium), Gerard in fr-BE French (Belgium), Dena in nl-BE Dutch (Belgium), Arnaud in nl-BE Dutch (Belgium), Polina in uk-UA Ukrainian (Ukraine), Ostap in uk-UA Ukrainian (Ukraine), Uzma in ur-PK Urdu (Pakistan), Asad in ur-PK Urdu (Pakistan).
Five languages from preview to GA - 10 voices in 5 locales introduced in November now are GA: Kert in et-EE Estonian (Estonia), Colm in ga-IE Irish (Ireland), Nils in lv-LV Latvian (Latvia), Leonas in lt-LT Lithuanian (Lithuania), Joseph in mt-MT Maltese (Malta).
New male voice added for French (Canada) - A new voice Antoine is available for fr-CA French (Canada).
Quality improvement - Pronunciation error rate reduction on hu-HU Hungarian - 48.17%, nb-NO Norwegian - 52.76%, nl-NL Dutch (Netherlands) - 22.11%.

With this release, we now support a total of 142 neural voices across 60 languages/locales. In addition, over 70 standard voices are available in 49 languages/locales. Visit Language support for the full list.

Get facial pose events to animate characters

Neural Text to speech now includes the viseme event. Viseme events allow users to get a sequence of facial poses along with synthesized speech. Visemes can be used to control the movement of 2D and 3D avatar models, matching mouth movements to synthesized speech. Viseme events are only available for en-US-AriaNeural voice at this time.

Add the bookmark element in Speech Synthesis Markup Language (SSML)

The bookmark element allows you to insert custom markers in SSML to get the offset of each marker in the audio stream. It can be used to reference a specific location in the text or tag sequence.

February 2021 release

Custom neural voice GA

Custom neural voice is GA in February in 13 languages: Chinese (Mandarin, Simplified), English (Australia), English (India), English (United Kingdom), English (United States), French (Canada), French (France), German (Germany), Italian (Italy), Japanese (Japan), Korean (Korea), Portuguese (Brazil), Spanish (Mexico), and Spanish (Spain). Learn more about what is custom neural voice and how to use it responsibly. Custom neural voice feature requires registration and Microsoft may limit access based on Microsoft's eligibility criteria. Learn more about the limited access.

December 2020 release

New neural voices in GA and preview

Released 51 new voices for a total of 129 neural voices across 54 languages/locales:

46 new voices in GA locales: Shakir in ar-EG Arabic (Egypt), Hamed in ar-SA Arabic (Saudi Arabia), Borislav in bg-BG Bulgarian (Bulgaria), Joana in ca-ES Catalan, Antonin in cs-CZ Czech (Czech Republic), Jeppe in da-DK Danish (Denmark), Jonas in de-AT German (Austria), Jan in de-CH German (Switzerland), Nestoras in el-GR Greek (Greece), Liam in en-CA English (Canada), Connor in en-IE English (Ireland), Madhur in en-IN Hindi (India), Mohan in en-IN Telugu (India), Prabhat in en-IN English (India), Valluvar in en-IN Tamil (India), Enric in es-ES Catalan, Kert in et-EE Estonian (Estonia), Harri in fi-FI Finnish (Finland), Selma in fi-FI Finnish (Finland), Fabrice in fr-CH French (Switzerland), Colm in ga-IE Irish (Ireland), Avri in he-IL Hebrew (Israel), Srecko in hr-HR Croatian (Croatia), Tamas in hu-HU Hungarian (Hungary), Gadis in id-ID Indonesian (Indonesia), Leonas in lt-LT Lithuanian (Lithuania), Nils in lv-LV Latvian (Latvia), Osman in ms-MY Malay (Malaysia), Joseph in mt-MT Maltese (Malta), Finn in nb-NO Norwegian, Bokmål (Norway), Pernille in nb-NO Norwegian, Bokmål (Norway), Fenna in nl-NL Dutch (Netherlands), Maarten in nl-NL Dutch (Netherlands), Agnieszka in pl-PL Polish (Poland), Marek in pl-PL Polish (Poland), Duarte in pt-BR Portuguese (Brazil), Raquel in pt-PT Portuguese (Potugal), Emil in ro-RO Romanian (Romania), Dmitry in ru-RU Russian (Russia), Svetlana in ru-RU Russian (Russia), Lukas in sk-SK Slovak (Slovakia), Rok in sl-SI Slovenian (Slovenia), Mattias in sv-SE Swedish (Sweden), Sofie in sv-SE Swedish (Sweden), Niwat in th-TH Thai (Thailand), Ahmet in tr-TR Turkish (Türkiye), NamMinh in vi-VN Vietnamese (Vietnam), HsiaoChen in zh-TW Taiwanese Mandarin (Taiwan), YunJhe in zh-TW Taiwanese Mandarin (Taiwan), HiuMaan in zh-HK Chinese Cantonese (Hong Kong Special Administrative Region), WanLung in zh-HK Chinese Cantonese (Hong Kong SAR).
5 new voices in preview locales: Kert in et-EE Estonian (Estonia), Colm in ga-IE Irish (Ireland), Nils in lv-LV Latvian (Latvia), Leonas in lt-LT Lithuanian (Lithuania), Joseph in mt-MT Maltese (Malta).

With this release, we now support a total of 129 neural voices across 54 languages/locales. In addition, over 70 standard voices are available in 49 languages/locales. Visit Language support for the full list.

Updates for Audio Content Creation

Improved voice selection UI with voice categories and detailed voice descriptions.
Enabled intonation tuning for all neural voices across different languages.
Automated the UI localization based on the language of the browser.
Enabled StyleDegree controls for all zh-CN Neural voices. Visit the Audio Content Creation tool to check out the new features.

Updates for zh-CN voices

Updated all zh-CN neural voices to support English speaking.
Enabled all zh-CN neural voices to support intonation adjustment. SSML or Audio Content Creation tool can be used to adjust for the best intonation.
Updated all zh-CN multi-style neural voices to support StyleDegree control. Emotion intensity (soft or strong) is adjustable.
Updated zh-CN-YunyeNeural to support multiple styles which can perform different emotions.

November 2020 release

New locales and voices in preview

Five new voices and languages are introduced to the Neural text to speech portfolio. They are: Grace in Maltese (Malta), Ona in Lithuanian (Lithuania), Anu in Estonian (Estonia), Orla in Irish (Ireland) and Everita in Latvian (Latvia).
Five new zh-CN voices with multiple styles and roles support: Xiaohan, Xiaomo, Xiaorui, Xiaoxuan and Yunxi.

These voices are available in public preview in three Azure regions: EastUS, SouthEastAsia and WestEurope.

Neural text to speech Container GA

With Neural text to speech Container, developers can run speech synthesis with the most natural digital voices in their own environment for specific security and data governance requirements. Check how to install Speech Containers.

New features

Custom voice: enabled users to copy a voice model from one region to another; supported endpoint suspension and resuming. Go to the Azure portal here.
SSML silence tag support.
General TTS voice quality improvements: Improved word-level pronunciation accuracy in nb-NO. Reduced 53% pronunciation error.

Read more at this tech blog.

October 2020 release

New features

Jenny supports a new newscast style. See how to use the speaking styles in SSML.
Neural voices upgraded to HiFiNet vocoder, with higher audio fidelity and faster synthesis speed. This benefits customers whose scenario relies on hi-fi audio or long interactions, including video translation, audio books, or online education materials. Read more about the story and hear the voice samples on our tech community blog
Custom voice & Audio Content Creation Studio localized to 17 locales. Users can easily switch the UI to a local language for a more friendly experience.
Audio Content Creation: Added style degree control for XiaoxiaoNeural; Refined the customized break feature to include incremental breaks of 50ms.

General TTS voice quality improvements

Improved word-level pronunciation accuracy in pl-PL (error rate reduction: 51%) and fi-FI (error rate reduction: 58%)
Improved ja-JP single word reading for the dictionary scenario. Reduced pronunciation error by 80%.
zh-CN-XiaoxiaoNeural: Improved sentiment/CustomerService/Newscast/Cheerful/Angry style voice quality.
zh-CN: Improved Erhua pronunciation and light tone and refined space prosody, which greatly improves intelligibility.

September 2020 release

New features

Neural text to speech
- Extended to support 18 new languages/locales. They are Bulgarian, Czech, German (Austria), German (Switzerland), Greek, English (Ireland), French (Switzerland), Hebrew, Croatian, Hungarian, Indonesian, Malay, Romanian, Slovak, Slovenian, Tamil, Telugu and Vietnamese.
- Released 14 new voices to enrich the variety in the existing languages. See full language and voice list.
- New speaking styles for en-US and zh-CN voices. Jenny, the new voice in English (US), supports chatbot, customer service, and assistant styles. 10 new speaking styles are available with our zh-CN voice, XiaoXiao. In addition, the XiaoXiao neural voice supports StyleDegree tuning. See how to use the speaking styles in SSML.
Containers: Neural text to speech Container released in public preview with 16 voices available in 14 languages. Learn more on how to deploy Speech Containers for Neural text to speech

Read the full announcement of the TTS updates for Ignite 2020

August 2020 release

New features

Neural text to speech: new speaking style for en-US Aria voice. AriaNeural can sound like a news caster when reading news. The 'newscast-formal' style sounds more serious, while the 'newscast-casual' style is more relaxed and informal. See how to use the speaking styles in SSML.
Custom voice: a new feature is released to automatically check training data quality. When you upload your data, the system will examine various aspects of your audio and transcript data, and automatically fix or filter issues to improve the quality of the voice model. This covers the volume of your audio, the noise level, the pronunciation accuracy of speech, the alignment of speech with the normalized text, silence in the audio, in addition to the audio and script format.
Audio Content Creation: a set of new features to enable more powerful voice tuning and audio management capabilities.
- Pronunciation: the pronunciation tuning feature is updated to the latest phoneme set. You can pick the right phoneme element from the library and refine the pronunciation of the words you have selected.
- Download: The audio "Download"/"Export" feature is enhanced to support generating audio by paragraph. You can edit content in the same file/SSML, while generating multiple audio outputs. The file structure of "Download" is refined as well. Now, you can easily get all audio files in one folder.
- Task status: The multi-file export experience is improved. When you export multiple files in the past, if one of the files has failed, the entire task will fail. But now, all other files will be successfully exported. The task report is enriched with more detailed and structured information. You can check the logs for all failed files and sentences now with the report.
- SSML documentation: linked to SSML document to help you check the rules for how to use all tuning features.
The Voice List API is updated to include a user-friendly display name and the speaking styles supported for neural voices.

General TTS voice quality improvements

Reduced word-level pronunciation error % for ru-RU (errors reduced by 56%) and sv-SE (errors reduced by 49%)
Improved polyphony word reading on en-US neural voices by 40%. Examples of polyphony words include "read", "live", "content", "record", "object", etc.
Improved the naturalness of the question tone in fr-FR. MOS (Mean Opinion Score) gain: +0.28
Updated the vocoders for the following voices, with fidelity improvements and overall performance speed-up by 40%.

Locale Voice

en-GB Mia

es-MX Dalia

fr-CA Sylvie

fr-FR Denise

ja-JP Nanami

ko-KR Sun-Hi

Locale	Voice
`en-GB`	Mia
`es-MX`	Dalia
`fr-CA`	Sylvie
`fr-FR`	Denise
`ja-JP`	Nanami
`ko-KR`	Sun-Hi

Bug fixes

Fixed a number of bugs with the Audio Content Creation tool
- Fixed issue with auto refreshing.
- Fixed issues with voice styles in zh-CN in the South East Asia region.
- Fixed stability issue, including an export error with the 'break' tag, and errors in punctuation.

October 2024 release

Video translation (Preview)

The video translation API is now available in public preview. For more information, see the How to use video translation.

September 2024 release

Real-time speech to text

Real-time speech to text has released new models, with better quality, for the following languages.

fi-FI/id-ID/zh-TW/pl-PL/pt-PT es-SV/es-EC/es-BO/es-PY/es-AR/es-DO/es-UY/es-CR/es-VE/es-NI/es-HN/es-PR/es-CO/es-CL/es-CU/es-PE/es-PA/es-GT/es-GQ

Fast transcription (Preview)

Fast transcription now supports diarization to recognize and separate multiple speakers on mono channel audio file. For more information, see fast transcription API guide.

August 2024 release

Language learning (Preview)

Language learning is now available in public preview. Interactive language learning can make your learning experience more engaging and effective. For more information, see Interactive language learning with pronunciation assessment.

Pronunciation assessment

Speech pronunciation assessment now supports 33 languages generally available, and each language is available on all Speech to text regions. For more information, see the full language list for Pronunciation assessment.

Language	Locale (BCP-47)
Arabic (Egypt)	`ar-EG`
Arabic (Saudi Arabia)	`ar-SA`
Catalan	`ca-ES`
Chinese (Cantonese, Traditional)	`zh-HK`
Chinese (Mandarin, Simplified)	`zh-CN`
Chinese (Taiwanese Mandarin, Traditional)	`zh-TW`
Danish (Denmark)	`da-DK`
Dutch (Netherlands)	`nl-NL`
English (Australia)	`en-AU`
English (Canada)	`en-CA`
English (India)	`en-IN`
English (United Kingdom)	`en-GB`
English (United States)	`en-US`
Finnish (Finland)	`fi-FI`
French (Canada)	`fr-CA`
French (France)	`fr-FR`
German (Germany)	`de-DE`
Hindi (India)	`hi-IN`
Italian (Italy)	`it-IT`
Japanese (Japan)	`ja-JP`
Korean (Korea)	`ko-KR`
Malay (Malaysia)	`ms-MY`
Norwegian Bokmål (Norway)	`nb-NO`
Polish (Poland)	`pl-PL`
Portuguese (Brazil)	`pt-BR`
Portuguese (Portugal)	`pt-PT`
Russian (Russia)	`ru-RU`
Spanish (Mexico)	`es-MX`
Spanish (Spain)	`es-ES`
Swedish (Sweden)	`sv-SE`
Tamil (India)	`ta-IN`
Thai (Thailand)	`th-TH`
Vietnamese (Vietnam)	`vi-VN`

July 2024 release

Fast Transcription API (Preview)

Fast transcription is now available in public preview. Fast transcription allows you to transcribe audio file to text accurately and synchronously, with a high speed factor. It can transcribe audio much faster than the actual audio length. For more information, see the fast transcription API guide.

Tip

Try out fast transcription in Azure AI Studio.

June 2024 release

Speech to text REST API v3.2 general availability

The Speech to text REST API version 3.2 is now generally available. For more information about speech to text REST API v3.2, see the Speech to text REST API v3.2 reference documentation and the Speech to text REST API guide.

Note

Preview versions 3.2-preview.1 and 3.2-preview.2 will be removed in September 2024.

Speech to text REST API v3.1 will be retired on a date to be announced. Speech to text REST API v3.0 will be retired on April 1st, 2026. For more information about upgrading, see the Speech to text REST API v3.0 to v3.1 and v3.1 to v3.2 migration guides.

May 2024 release

Video translation (Preview)

Video translation is now available in public preview. Video translation is a feature in Azure AI Speech that enables you to seamlessly translate and generate videos in multiple languages automatically. This feature is designed to help you localize your video content to cater to diverse audiences around the globe. You can efficiently create immersive, localized videos across various use cases such as vlogs, education, news, enterprise training, advertising, film, TV shows, and more. For more information, see the video translation overview.

Pronunciation Assessment

Speech Pronunciation Assessment now supports 24 languages generally available (with one new language added), with 7 more languages available in public preview. For more information, see the full language list for Pronunciation Assessment.

April 2024 release

Automatic multi-lingual speech translation (Preview)

Automatic multi-lingual speech translation is available in public preview. This innovative feature revolutionizes the way language barriers are overcome, offering unparalleled capabilities for seamless communication across diverse linguistic landscapes.

Key Highlights

Unspecified input language: Multi-lingual speech translation can receive audio in a wide range of languages, and there's no need to specify what the expected input language is. It makes it an invaluable feature to understand and collaborate across global contexts without the need for presetting.
Language switching: Multi-lingual speech translation allows for multiple languages to be spoken during the same session, and have them all translated into the same target language. There's no need to restart a session when the input language changes or any other actions by you.

How it works

Travel interpreter: multi-lingual speech translation can enhance the experience of tourists visiting foreign destinations by providing them with information and assistance in their preferred language. Hotel concierge services, guided tours, and visitor centers can utilize this technology to cater to diverse linguistic needs.
International conferences: multi-lingual speech translation can facilitate communication among participants from different regions who might speak various languages using live translated caption. Attendees can speak in their native languages without needing to specify them, ensuring seamless understanding and collaboration.
Educational meetings: In multi-cultural classrooms or online learning environments, multi-lingual speech translation can support language diversity among students and teachers. It allows for seamless communication and participation without the need to specify each student's or instructor's language.

How to access

For a detailed introduction, visit Speech translation overview. Additionally, you can refer to the code samples at how to translate speech. This new feature is fully supported by all SDK versions from 1.37.0 onwards.

Real-time speech to text with diariazation (GA)

Real-time speech to text with diariazation is now generally available.

You can create speech to text applications that use diarization to distinguish between the different speakers who participate in the conversation. For more information about real-time diarization, Check out the real-time diarization quickstart.

Speech to text model Update

Real-time speech to text has released new models with bilingual capabilities. The en-IN model now supports both English and Hindi bilingual scenarios and offers improved accuracy. Arabic locales (ar-AE, ar-BH, ar-DZ, ar-IL, ar-IQ, ar-KW, ar-LB, ar-LY, ar-MA, ar-OM, ar-PS, ar-QA, ar-SA, ar-SY, ar-TN, ar-YE) are now equipped with bilingual support for English, enhanced accuracy and call center support.

Batch transcription provides models with new architecture for these locales: es-ES, es-MX, fr-FR, it-IT, ja-JP, ko-KR, pt-BR, and zh-CN. These models significantly enhance readability and entity recognition.

March 2024 release

Whisper general availability (GA)

The Whisper speech to text model with Azure AI Speech is now generally available.

Check out What is the Whisper model? to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.

February 2024 release

Pronunciation Assessment

Speech Pronunciation Assessment now supports 23 languages generally available (with 5 new languages added), with 3 more languages available in public preview. For more information, see the full language list for Pronunciation Assessment.

Phrase list

Added phrase list support for the following locales: ar-SA, de-CH, en-IE, en-ZA, es-US, id-ID, nl-NL, pl-PL, pt-PT, ru-RU, sv-SE, th-TH, vi-VN, zh-HK, zh-TW.

November 2023 release

Introducing Bilingual Speech Modeling!

We're thrilled to unveil a groundbreaking addition to our real-time speech modeling—Bilingual Speech Modeling. This significant enhancement allows our speech model to seamlessly support bilingual language pairs, such as English and Spanish, as well as English and French. This feature empowers users to effortlessly switch between languages during real-time interactions, marking a pivotal moment in our commitment to enhancing communication experiences.

Key Highlights:

Bilingual Support: With our latest release, users can seamlessly switch between English and Spanish or between English and French during real-time speech interactions. This functionality is tailored to accommodate bilingual speakers who frequently transition between these two languages.
Enhanced User Experience: Bilingual speakers, whether at work, home, or in various community settings, will find this feature immensely beneficial. The model's ability to comprehend and respond to both English and Spanish in real time opens up new possibilities for effective and fluid communication.

How to Use:

Choose es-US (Spanish and English) or fr-CA (French and English) when you call the Speech Service API or try it out on Speech Studio. Feel free to speak either language or mix them together—the model is designed to adapt dynamically, providing accurate and context-aware responses in both languages.

It's time to elevate your communication game with our latest feature release—seamless, multi-lingual communication at your fingertips!

Speech To text models update

We're excited to introduce a significant update to our speech models, promising enhanced accuracy, improved readability, and refined entity recognition. This upgrade comes with a robust new structure, bolstered by an expanded training dataset, ensuring a marked advancement in overall performance. It includes newly released models for en-US, zh-CN, ja-JP, it-IT, pt-BR, es-MX, es-ES, fr-FR, de-DE, ko-KR, tr-TR, sv-SE, and he-IL.

Highlights:

Better accuracy with new model structure: The redefined model structure, coupled with a richer training dataset, elevates accuracy levels, promising more precise speech output.
Readability improvement: Our latest model brings a substantial boost to readability, enhancing the coherence and clarity of spoken content.
Advanced entity recognition: Entity recognition receives a substantial upgrade, resulting in more accurate and nuanced results.

Potential impacts: Despite these advancements, it's crucial to be mindful of potential impacts:

Custom Silence Timeout Feature: Users employing custom silence timeout, especially with low settings, might encounter over-segmentation and potential omissions of single-word phrases.
The new model might exhibit compatibility issues with the Keyword prefix feature, and users are advised to assess its performance in their specific applications.
Reduced disfluency words or phrases: Users might notice a reduction in disfluency words or phrases like "um" or "uh" in the speech output.
Inaccuracies in word timestamp duration: Some disfluency words might display inaccuracies in timestamp duration, requiring attention in applications dependent on precise timing.
Confidence score distribution variance: Users relying on confidence scores and associated thresholds should be aware of potential variations in distribution, necessitating adjustments for optimal performance.
The accuracy enhancement of the phrase list feature might be affected by the misrecognition of certain phrases.

We encourage you to explore these improvements and consider potential issues for a seamless transition, and as always, your feedback is instrumental in refining and advancing our services.

Pronunciation Assessment

Speech Pronunciation Assessment now supports 18 languages generally available, with six more languages available in public preview. For more information, see the full language list for Pronunciation Assessment.
We're excited to announce that Pronunciation Assessment is introducing new features starting November 1, 2023: Prosody, Grammar, Vocabulary, and Topic. These enhancements aim to provide an even more comprehensive language learning experience for both reading and speaking assessments. Upgrade to SDK version 1.35.0 or later to explore further details in the How to use pronunciation assessment and Pronunciation assessment in Speech Studio.

September 2023 release

Whisper public preview

Azure AI Speech now supports OpenAI's Whisper model via the batch transcription API. To learn more, check out the Create a batch transcription guide.

Note

Azure OpenAI Service also supports OpenAI's Whisper model for speech to text with a synchronous REST API. To learn more, check out the quickstart.

Check out What is the Whisper model? to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.

Speech to text REST API v3.2 public preview

Speech to text REST API v3.2 is available in preview. Speech to text REST API v3.1 is generally available. Speech to text REST API v3.0 will be retired on April 1st, 2026. For more information, see the Speech to text REST API v3.0 to v3.1 and v3.1 to v3.2 migration guides.

August 2023 release

New Speech to text-locales:

Speech to text supports two new locales as shown in the following table. Refer to the complete language list here.

Locale	Language
`pa-IN`	Punjabi (India)
`ur-IN`	Urdu (India)

Pronunciation Assessment

Speech Pronunciation Assessment now supports 3 additional languages generally available in English (Canada), English (India), and French (Canada), with 3 additional languages available in preview. For more information, see the full language list for Pronunciation Assessment.

May 2023 release

Pronunciation Assessment

Speech Pronunciation Assessment now supports 3 additional languages generally available in German (Germany), Japanese (Japan), and Spanish (Mexico), with 4 additional languages available in preview. For more information, see the full language list for Pronunciation Assessment.
You can now use the standard Speech to Text commitment tier for pronunciation assessment on all public regions. If you purchase a commitment tier for standard Speech to text, the spend for pronunciation assessment goes towards meeting the commitment. See commitment tier pricing.

February 2023 release

Pronunciation Assessment

Speech Pronunciation Assessment now supports 5 additional languages generally available in English (United Kingdom), English (Australia), French (France), Spanish (Spain), and Chinese (Mandarin, Simplified), with other languages available in preview.
Added sample codes showing how to use Pronunciation Assessment in streaming mode in your own application.
- C#: See sample code.
- C++: See sample code.
- java: See sample code.
- javascript: See sample code.
- Objective-C: See sample code.
- Python: See sample code.
- Swift: See sample code.

Custom speech

Support for audio + human-labeled transcript is added for the de-AT locales.

January 2023 release

Custom speech

Support for audio + human-labeled transcript is added for additional locales: ar-BH, ar-DZ, ar-EG, ar-MA, ar-SA, ar-TN, ar-YE, and ja-JP.

Support for structured text adaptation is added for locale de-AT.

December 2022 release

Speech to text REST API

The Speech to text REST API version 3.1 is generally available. Version 3.0 of the Speech to text REST API will be retired. For more information about how to migrate, see the guide.

October 2022 release

New speech to text locale

Added support for Malayalam (India) with the ml-IN locale. See the complete language list here.

July 2022 release

New Speech to text-locales:

Added 7 new locales as shown in the following table. See the complete language list here.

Locale	Language
`bs-BA`	Bosnian (Bosnia and Herzegovina)
`yue-CN`	Chinese (Cantonese, Simplified)
`zh-CN-sichuan`	Chinese (Southwestern Mandarin, Simplified)
`wuu-CN`	Chinese (Wu, Simplified)
`ps-AF`	Pashto (Afghanistan)
`so-SO`	Somali (Somalia)
`cy-GB`	Welsh (United Kingdom)

June 2022 release

New Speech to text-locales:

Added 10 new locales as shown in the following table. See the complete language list here.

Locale	Language
`sq-AL`	Albanian (Albania)
`hy-AM`	Armenian (Armenia)
`az-AZ`	Azerbaijani (Azerbaijan)
`eu-ES`	Basque
`gl-ES`	Galician
`ka-GE`	Georgian (Georgia)
`it-CH`	Italian (Switzerland)
`kk-KZ`	Kazakh (Kazakhstan)
`mn-MN`	Mongolian (Mongolia)
`ne-NP`	Nepali (Nepal)

April 2022 release

New Speech to text-locales:

Below is a list of the new locales. See the complete language list here.

Locale	Language
`bn-IN`	Bengali (India)

January 2022 release

New Speech to text-locales:

Below is a list of the new locales. See the complete language list here.

Locale	Language
`af-ZA`	Afrikaans (South Africa)
`am-ET`	Amharic (Ethiopia)
`de-CH`	German (Switzerland)
`fr-BE`	French (Belgium)
`is-IS`	Icelandic (Iceland)
`jv-ID`	Javanese (Indonesia)
`km-KH`	Khmer (Cambodia)
`kn-IN`	Kannada (India)
`lo-LA`	Lao (Laos)
`mk-MK`	Macedonian (North Macedonia)
`my-MM`	Burmese (Myanmar)
`nl-BE`	Dutch (Belgium)
`si-LK`	Sinhala (Sri Lanka)
`sr-RS`	Serbian (Serbia)
`sw-TZ`	Swahili (Tanzania)
`uk-UA`	Ukrainian (Ukraine)
`uz-UZ`	Uzbek (Uzbekistan)
`zu-ZA`	Zulu (South Africa)

July 2021 release

New Speech to text-locales:

Below is a list of the new locales. See the complete language list here.

Locale	Language
`ar-DZ`	Arabic (Algeria)
`ar-LY`	Arabic (Libya)
`ar-MA`	Arabic (Morocco)
`ar-TN`	Arabic (Tunisia)
`ar-YE`	Arabic (Yemen)
`bg-BG`	Bulgarian (Bulgaria)
`el-GR`	Greek (Greece)
`et-EE`	Estonian (Estonia)
`fa-IR`	Persian (Iran)
`ga-IE`	Irish (Ireland)
`hr-HR`	Croatian (Croatia)
`lt-LT`	Lithuanian (Lithuania)
`lv-LV`	Latvian (Latvia)
`mt-MT`	Maltese (Malta)
`ro-RO`	Romanian (Romania)
`sk-SK`	Slovak (Slovakia)
`sl-SI`	Slovenian (Slovenia)
`sw-KE`	Swahili (Kenya)

January 2021 release

New Speech to text-locales:

Below is a list of the new locales. See the complete language list here.

Locale	Language
`ar-AE`	Arabic (United Arab Emirates)
`ar-IL`	Arabic (Israel)
`ar-IQ`	Arabic (Iraq)
`ar-OM`	Arabic (Oman)
`ar-PS`	Arabic (Palestinian Authority)
`de-AT`	German (Austria)
`en-GH`	English (Ghana)
`en-KE`	English (Kenya)
`en-NG`	English (Nigeria)
`en-TZ`	English (Tanzania)
`es-GQ`	Spanish (Equatorial Guinea)
`fil-PH`	Filipino (Philippines)
`fr-CH`	French (Switzerland)
`he-IL`	Hebrew (Israel)
`id-ID`	Indonesian (Indonesia)
`ms-MY`	Malay (Malaysia)
`vi-VN`	Vietnamese (Vietnam)

August 2020 Release

New speech to text locales:

Speech to text released 26 new locales in August: 2 European languages cs-CZ and hu-HU, 5 English locales and 19 Spanish locales that cover most South American countries/regions. Below is a list of the new locales. See the complete language list here.

Locale	Language
`cs-CZ`	Czech (Czech Republic)
`en-HK`	English (Hong Kong Special Administrative Region)
`en-IE`	English (Ireland)
`en-PH`	English (Philippines)
`en-SG`	English (Singapore)
`en-ZA`	English (South Africa)
`es-AR`	Spanish (Argentina)
`es-BO`	Spanish (Bolivia)
`es-CL`	Spanish (Chile)
`es-CO`	Spanish (Colombia)
`es-CR`	Spanish (Costa Rica)
`es-CU`	Spanish (Cuba)
`es-DO`	Spanish (Dominican Republic)
`es-EC`	Spanish (Ecuador)
`es-GT`	Spanish (Guatemala)
`es-HN`	Spanish (Honduras)
`es-NI`	Spanish (Nicaragua)
`es-PA`	Spanish (Panama)
`es-PE`	Spanish (Peru)
`es-PR`	Spanish (Puerto Rico)
`es-PY`	Spanish (Paraguay)
`es-SV`	Spanish (El Salvador)
`es-US`	Spanish (USA)
`es-UY`	Spanish (Uruguay)
`es-VE`	Spanish (Venezuela)
`hu-HU`	Hungarian (Hungary)

2024-October release

Add support for the latest model versions:

Speech language identification 1.16.0
Neural text to speech 3.5.0
- Make en-us-ariacpuneural an alias to en-us-jessacpuneural
- Update the text to speech backend engine version
Speech to text 4.10.0
- Restore support for locale uk-UA
- Fix silence settings to work with long periods of silence in the audio
- Replace deprecated models: cs-CZ, da-DK, en-GB, fr-CA, hu-HU, it-CH, tr-TR, zh-CN-sichuan
Custom speech to text 4.10.0

2024-September release

Add support for the latest model versions:

Speech language identification 1.15.0
- Mitigate Vulnerabilities
Neural text to speech 3.4.0
- New voices: en-us-andrewmultilingualneural, en-us-jessaneural, es-us-alonsoneural, es-us-palomaneural, it-it-isabellamultilingualneural
- Mitigate Vulnerabilities
Speech to text 4.9.0
- New Locales: ar-YE, af-ZA, am-ET, ar-MA, ar-TN, sw-KE, sw-TZ, zu-ZA
- Mitigate Vulnerabilities
- Update Deprecated Models
Custom speech to text 4.9.0
- Mitigate Vulnerabilities

2024-August release

Add support for the latest model versions:

Speech language identification 1.14.0
- Upgrade .NET 8.0
- Mitigate Vulnerabilities
Neural text to speech 3.3.0
- Upgrade .NET 8.0
- Mitigate Vulnerabilities
Speech to text 4.8.0
- Upgrade .NET 8.0
- Mitigate Vulnerabilities
- Upgrade Recognition Engine
- Fix the issue where PropertyId.Speech_SegmentationSilenceTimeoutMs was being ignored.
- Update Deprecated Models
- Remove the uk-UA locale

2024-February release

Add support for the latest model versions:

Custom speech to text 4.6.0
Speech to text 4.6.0
Neural text to speech 3.1.0

Upgrade speech to text components to the latest. Upgrade all es locales models to the latest. Increase media transforming buffer for speech to text use cases.

2023-November release

Add support for the latest model versions:

Custom speech to text 4.5.0
Speech to text 4.5.0
Neural text to speech 2.19.0

2023-October release

Add support for the latest model versions:

Custom speech to text 4.4.0
Speech to text 4.4.0
Neural text to speech 2.18.0

Fix a bunch of high risk vulnerability issues.

Remove redundant logs in containers.

Upgrade internal media component to the latest.

Add support for voice en-IN-NeerjaNeural.

2023-September release

Add support for the latest model versions:

Speech language identification 1.12.0
Custom speech to text 4.3.0
Speech to text 4.3.0
Neural text to speech 2.17.0

Upgrade custom speech to text and speech to text to the latest framework.

Fix vulnerability issues.

Add support for voice ar-AE-FatimaNeural.

2023-July release

Add support for the latest model versions:

Custom speech to text 4.1.0
Speech to text 4.1.0
Neural text to speech 2.15.0

Fix the issue of running speech to text container via docker mount options with local custom model files.

Fix the issue that in some cases the RECOGNIZING event doesn't show up in response through the Speech SDK.

Fix vulnerability issues.

2023-June release

Add support for the latest model versions:

Custom speech to text 4.0.0
Speech to text 4.0.0
Neural text to speech 2.14.0

On-premises speech to text images are upgraded to .NET 6.0

Upgrade display models for locales including en-us, ar-eg, ar-bh, ja-jp, ko-kr, and more.

Upgrade the speech to text container component to address vulnerability issues.

Add support for locale voices de-DE-AmalaNeural,de-AT-IngridNeural,de-AT-JonasNeural, and en-US-JennyMultilingualNeural

2023-May release

Add support for the latest model versions:

Custom speech to text 3.14.0
Speech to text 3.14.0
Neural text to speech 2.13.0

Fix the he-IL punctuation issue

Fix vulnerability issues

Add new locale voice en-US-MichelleNeuraland es-MX-CandelaNeural

2023-April release

Security Updates

Fix vulnerability issues

2023-March release

Add support for the latest model versions:

Custom speech to text 3.12.0
Speech to text 3.12.0
Speech language identification 1.11.0
Neural text to speech 2.11.0

Fix vulnerability issues

Fix the tr-TR capitalization issue

Upgrade the speech to text en-US display models

Add support for prebuilt neural Neural text to speech locale voice ar-AE-HamdanNeural

2023-February release

New container versions

Add support for latest model versions:

Custom speech to text 3.11.0
Speech to text 3.11.0
Neural text to speech 2.10.0

Fix vulnerability issues

Regular upgrade for speech models

Add new Abraic locales:

ar-IL
ar-PS

Upgrade Hebrew and Turkish display models

2023-January release

New container versions

Add support for latest model versions:

Custom speech to text 3.10.0
Speech to text 3.10.0
Neural text to speech 2.9.0

Fix Hypothesis mode issue

Fix HTTP Proxy issue

Custom speech to text container disconnected mode

Add CNV Disconnected container support to TTS Frontend

Add support for these locale-voices:

da-DK-ChristelNeural
da-DK-JeppeNeural
en-IN-PrabhatNeural

2022-December release

New container versions

Add support for latest model versions:

Custom speech to text 3.9.0
Speech to text 3.9.0
Neural text to speech 2.8.0

Fix ipv4/ipv6 issue

Fix vulnerability issue

2022-November release

New container versions

Add support for latest model versions:

Custom speech to text 3.8.0
Speech to text 3.8.0
Neural text to speech 2.7.0

2022-October release

New container versions

Add support for latest model versions:

Custom speech to text 3.7.0
Speech to text 3.7.0
Neural text to speech 2.6.0

2022-September release

Speech to text 3.6.0-amd64

Add support for latest model versions.

Add support for these locales:

az-az
bn-in
bs-ba
cy-gb
eu-es
fa-ir
gl-es
he-il
hy-am
it-ch
ka-ge
kk-kz
mk-mk
mn-mn
ne-np
ps-af
so-so
sq-al
wuu-cn
yue-cn
zh-cn-sichuan

Regular monthly updates including security upgrades and vulnerability fixes.

Custom speech to text 3.6.0-amd64

Regular monthly updates including security upgrades and vulnerability fixes.

Neural text to speech v2.5.0

Add support for these prebuilt neural voices:

az-az-babekneural
az-az-banuneural
fa-ir-dilaraneural
fa-ir-faridneural
fil-ph-angeloneural
fil-ph-blessicaneural
he-il-avrineural
he-il-hilaneural
id-id-ardineural
id-id-gadisneural
ka-ge-ekaneural
ka-ge-giorgineural

Regular monthly updates including security upgrades and vulnerability fixes.

2022-May release

Speech-language-detection Container v1.9.0-amd64-preview

Bug fixes for speech-language-detection.

2022-March release

Custom speech to text Container v3.1.0

Add support to get display models.

2022-January release

Speech to text Container v3.0.0

Add support for using containers in disconnected environments.

Speech to text Container v2.18.0

Regular monthly updates including security upgrades and vulnerability fixes.

Neural-Neural text to speech Container v1.12.0

Add support for these prebuilt neural voices: am-et-amehaneural, am-et-mekdesneural, so-so-muuseneural, and so-so-ubaxneural.

Regular monthly updates including security upgrades and vulnerability fixes.

Делите путем

What's new in Azure AI Speech?

Recent highlights

Release notes

2024-November release

Speech SDK 1.41.1: 2024-October release

New Features

Bug Fixes

Breaking Changes

Speech SDK 1.40: 2024-August release

New features

Bug fixes

Samples

Speech SDK 1.38.0: 2024-June release

New features

Bug fixes

Samples

Speech SDK 1.37.0: 2024-April release

New features

Bug fixes

Samples

Speech SDK 1.36.0: 2024-March release

New features

Bug fixes

Samples

Speech SDK 1.35.0: February 2024 release

New features

Bug fixes

Samples

Speech SDK 1.34.1: January 2024 release

Breaking changes

New features

Bug fixes

Speech SDK 1.34.0: November 2023 release

Breaking changes

New features

Bug fixes

Samples

Speech CLI 1.34.0: November 2023 release

New features

Bug fixes

Speech SDK 1.33.0: October 2023 release

Breaking change notice

New features

Bug fixes

Samples

Speech CLI 1.33.0: October 2023 release

New features

Bug fixes

Speech SDK 1.32.1: September 2023 release

Bug fixes

Samples

Speech SDK 1.31.0: August 2023 release

New Features

Breaking changes

Bug fixes

Samples

Speech SDK 1.30.0: July 2023 release

New Features

Bug fixes

More notes

Samples

Speech SDK 1.29.0: June 2023 release

New Features

Bug fixes

Samples

Speech SDK 1.28.0: May 2023 release

Breaking change

New Features

Bug fixes

Samples

Speech SDK 1.27.0: April 2023 release

Notification about upcoming changes

New Features

Bug fixes

Samples

Speech SDK 1.26.0: March 2023 release

Breaking changes

New features

Bug fixes