What is Speech Studio?
Speech Studio is a set of UI-based tools for building and integrating features from Azure Cognitive Services Speech service in your applications. You create projects in Speech Studio by using a no-code approach, and then reference those assets in your applications by using the Speech SDK, the Speech CLI, or the REST APIs.
You can try speech-to-text and text-to-speech in Speech Studio without signing up or writing any code.
Speech Studio scenarios
Explore, try out, and view sample code for some of common use cases.
Captioning: Choose a sample video clip to see real-time or offline processed captioning results. Learn how to synchronize captions with your input audio, apply profanity filters, get partial results, apply customizations, and identify spoken languages for multilingual scenarios. For more information, see the captioning quickstart.
Call Center: View a demonstration on how to use the Language and Speech services to analyze call center conversations. Transcribe calls in real-time or process a batch of calls, redact personally identifying information, and extract insights such as sentiment to help with your call center use case. For more information, see the call center quickstart.
Speech Studio features
In Speech Studio, the following Speech service features are available as project types:
Real-time speech-to-text: Quickly test speech-to-text by dragging audio files here without having to use any code. This is a demo tool for seeing how speech-to-text works on your audio samples. To explore the full functionality, see What is speech-to-text?.
Custom Speech: Create speech recognition models that are tailored to specific vocabulary sets and styles of speaking. In contrast to the base speech recognition model, Custom Speech models become part of your unique competitive advantage because they're not publicly accessible. To get started with uploading sample audio to create a Custom Speech model, see Upload training and testing datasets.
Pronunciation assessment: Evaluate speech pronunciation and give speakers feedback on the accuracy and fluency of spoken audio. Speech Studio provides a sandbox for testing this feature quickly, without code. To use the feature with the Speech SDK in your applications, see the Pronunciation assessment article.
Voice Gallery: Build apps and services that speak naturally. Choose from a broad portfolio of languages, voices, and variants. Bring your scenarios to life with highly expressive and human-like neural voices.
Custom Voice: Create custom, one-of-a-kind voices for text-to-speech. You supply audio files and create matching transcriptions in Speech Studio, and then use the custom voices in your applications. To create and use custom voices via endpoints, see Create and use your voice model.
Audio Content Creation: A no-code approach for text-to-speech synthesis. You can use the output audio as-is, or as a starting point for further customization. You can build highly natural audio content for a variety of scenarios, such as audiobooks, news broadcasts, video narrations, and chat bots. For more information, see the Audio Content Creation documentation.
Custom Keyword: A custom keyword is a word or short phrase that you can use to voice-activate a product. You create a custom keyword in Speech Studio, and then generate a binary file to use with the Speech SDK in your applications.
Custom Commands: Easily build rich, voice-command apps that are optimized for voice-first interaction experiences. Custom Commands provides a code-free authoring experience in Speech Studio, an automatic hosting model, and relatively lower complexity. The feature helps you focus on building the best solution for your voice-command scenarios. For more information, see the Develop Custom Commands applications guide. Also see Integrate with a client application by using the Speech SDK.