Quickstart: Convert text to speech
Reference documentation | Package (NuGet) | Additional Samples on GitHub
In this quickstart, you run an application that does text to speech synthesis.
Tip
You can try text to speech in Speech Studio without signing up or writing any code.
Prerequisites
- Azure subscription - Create one for free
- Create a Speech resource in the Azure portal.
- Get the Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
Set up the environment
The Speech SDK is available as a NuGet package and implements .NET Standard 2.0. You install the Speech SDK later in this guide, but first check the SDK installation guide for any more requirements.
Set environment variables
Your application must be authenticated to access Cognitive Services resources. For production, use a secure way of storing and accessing your credentials. For example, after you get a key for your Speech resource, write it to a new environment variable on the local machine running the application.
Tip
Don't include the key directly in your code, and never post it publicly. See the Cognitive Services security article for more authentication options like Azure Key Vault.
To set the environment variable for your Speech resource key, open a console window, and follow the instructions for your operating system and development environment.
- To set the
SPEECH_KEY
environment variable, replaceyour-key
with one of the keys for your resource. - To set the
SPEECH_REGION
environment variable, replaceyour-region
with one of the regions for your resource.
setx SPEECH_KEY your-key
setx SPEECH_REGION your-region
Note
If you only need to access the environment variable in the current running console, you can set the environment variable with set
instead of setx
.
After you add the environment variables, you may need to restart any running programs that will need to read the environment variable, including the console window. For example, if you are using Visual Studio as your editor, restart Visual Studio before running the example.
Synthesize to speaker output
Follow these steps to create a new console application and install the Speech SDK.
Open a command prompt where you want the new project, and create a console application with the .NET CLI. The
Program.cs
file should be created in the project directory.dotnet new console
Install the Speech SDK in your new project with the .NET CLI.
dotnet add package Microsoft.CognitiveServices.Speech
Replace the contents of
Program.cs
with the following code.using System; using System.IO; using System.Threading.Tasks; using Microsoft.CognitiveServices.Speech; using Microsoft.CognitiveServices.Speech.Audio; class Program { // This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION" static string speechKey = Environment.GetEnvironmentVariable("SPEECH_KEY"); static string speechRegion = Environment.GetEnvironmentVariable("SPEECH_REGION"); static void OutputSpeechSynthesisResult(SpeechSynthesisResult speechSynthesisResult, string text) { switch (speechSynthesisResult.Reason) { case ResultReason.SynthesizingAudioCompleted: Console.WriteLine($"Speech synthesized for text: [{text}]"); break; case ResultReason.Canceled: var cancellation = SpeechSynthesisCancellationDetails.FromResult(speechSynthesisResult); Console.WriteLine($"CANCELED: Reason={cancellation.Reason}"); if (cancellation.Reason == CancellationReason.Error) { Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}"); Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]"); Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?"); } break; default: break; } } async static Task Main(string[] args) { var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion); // The language of the voice that speaks. speechConfig.SpeechSynthesisVoiceName = "en-US-JennyNeural"; using (var speechSynthesizer = new SpeechSynthesizer(speechConfig)) { // Get text from the console and synthesize to the default speaker. Console.WriteLine("Enter some text that you want to speak >"); string text = Console.ReadLine(); var speechSynthesisResult = await speechSynthesizer.SpeakTextAsync(text); OutputSpeechSynthesisResult(speechSynthesisResult, text); } Console.WriteLine("Press any key to exit..."); Console.ReadKey(); } }
To change the speech synthesis language, replace
en-US-JennyNeural
with another supported voice. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you setes-ES-ElviraNeural
, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.
Build and run your new console application to start speech synthesis to the default speaker.
dotnet run
Important
Make sure that you set the SPEECH__KEY
and SPEECH__REGION
environment variables as described above. If you don't set these variables, the sample will fail with an error message.
Enter some text that you want to speak. For example, type "I'm excited to try text to speech." Press the Enter key to hear the synthesized speech.
Enter some text that you want to speak >
I'm excited to try text to speech
Remarks
Now that you've completed the quickstart, here are some additional considerations:
This quickstart uses the SpeakTextAsync
operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:
- For information about speech synthesis from a file and finer control over voice styles, prosody, and other settings, see How to synthesize speech and Improve synthesis with Speech Synthesis Markup Language (SSML).
- For information about synthesizing long-form text to speech, see batch synthesis.
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.
Reference documentation | Package (NuGet) | Additional Samples on GitHub
In this quickstart, you run an application that does text to speech synthesis.
Tip
You can try text to speech in Speech Studio without signing up or writing any code.
Prerequisites
- Azure subscription - Create one for free
- Create a Speech resource in the Azure portal.
- Get the Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
Set up the environment
The Speech SDK is available as a NuGet package and implements .NET Standard 2.0. You install the Speech SDK later in this guide, but first check the SDK installation guide for any more requirements.
Set environment variables
Your application must be authenticated to access Cognitive Services resources. For production, use a secure way of storing and accessing your credentials. For example, after you get a key for your Speech resource, write it to a new environment variable on the local machine running the application.
Tip
Don't include the key directly in your code, and never post it publicly. See the Cognitive Services security article for more authentication options like Azure Key Vault.
To set the environment variable for your Speech resource key, open a console window, and follow the instructions for your operating system and development environment.
- To set the
SPEECH_KEY
environment variable, replaceyour-key
with one of the keys for your resource. - To set the
SPEECH_REGION
environment variable, replaceyour-region
with one of the regions for your resource.
setx SPEECH_KEY your-key
setx SPEECH_REGION your-region
Note
If you only need to access the environment variable in the current running console, you can set the environment variable with set
instead of setx
.
After you add the environment variables, you may need to restart any running programs that will need to read the environment variable, including the console window. For example, if you are using Visual Studio as your editor, restart Visual Studio before running the example.
Synthesize to speaker output
Follow these steps to create a new console application and install the Speech SDK.
Create a new C++ console project in Visual Studio Community 2022 named
SpeechSynthesis
.Install the Speech SDK in your new project with the NuGet package manager.
Install-Package Microsoft.CognitiveServices.Speech
Replace the contents of
SpeechSynthesis.cpp
with the following code:#include <iostream> #include <stdlib.h> #include <speechapi_cxx.h> using namespace Microsoft::CognitiveServices::Speech; using namespace Microsoft::CognitiveServices::Speech::Audio; std::string GetEnvironmentVariable(const char* name); int main() { // This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION" auto speechKey = GetEnvironmentVariable("SPEECH_KEY"); auto speechRegion = GetEnvironmentVariable("SPEECH_REGION"); if ((size(speechKey) == 0) || (size(speechRegion) == 0)) { std::cout << "Please set both SPEECH_KEY and SPEECH_REGION environment variables." << std::endl; return -1; } auto speechConfig = SpeechConfig::FromSubscription(speechKey, speechRegion); // The language of the voice that speaks. speechConfig->SetSpeechSynthesisVoiceName("en-US-JennyNeural"); auto speechSynthesizer = SpeechSynthesizer::FromConfig(speechConfig); // Get text from the console and synthesize to the default speaker. std::cout << "Enter some text that you want to speak >" << std::endl; std::string text; getline(std::cin, text); auto result = speechSynthesizer->SpeakTextAsync(text).get(); // Checks result. if (result->Reason == ResultReason::SynthesizingAudioCompleted) { std::cout << "Speech synthesized to speaker for text [" << text << "]" << std::endl; } else if (result->Reason == ResultReason::Canceled) { auto cancellation = SpeechSynthesisCancellationDetails::FromResult(result); std::cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl; if (cancellation->Reason == CancellationReason::Error) { std::cout << "CANCELED: ErrorCode=" << (int)cancellation->ErrorCode << std::endl; std::cout << "CANCELED: ErrorDetails=[" << cancellation->ErrorDetails << "]" << std::endl; std::cout << "CANCELED: Did you set the speech resource key and region values?" << std::endl; } } std::cout << "Press enter to exit..." << std::endl; std::cin.get(); } std::string GetEnvironmentVariable(const char* name) { #if defined(_MSC_VER) size_t requiredSize = 0; (void)getenv_s(&requiredSize, nullptr, 0, name); if (requiredSize == 0) { return ""; } auto buffer = std::make_unique<char[]>(requiredSize); (void)getenv_s(&requiredSize, buffer.get(), requiredSize, name); return buffer.get(); #else auto value = getenv(name); return value ? value : ""; #endif }
To change the speech synthesis language, replace
en-US-JennyNeural
with another supported voice. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you setes-ES-ElviraNeural
, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.
Build and run your new console application to start speech synthesis to the default speaker.
Important
Make sure that you set the SPEECH__KEY
and SPEECH__REGION
environment variables as described above. If you don't set these variables, the sample will fail with an error message.
Enter some text that you want to speak. For example, type "I'm excited to try text to speech." Press the Enter key to hear the synthesized speech.
Enter some text that you want to speak >
I'm excited to try text to speech
Remarks
Now that you've completed the quickstart, here are some additional considerations:
This quickstart uses the SpeakTextAsync
operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:
- For information about speech synthesis from a file and finer control over voice styles, prosody, and other settings, see How to synthesize speech and Improve synthesis with Speech Synthesis Markup Language (SSML).
- For information about synthesizing long-form text to speech, see batch synthesis.
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.
Reference documentation | Package (Go) | Additional Samples on GitHub
In this quickstart, you run an application that does text to speech synthesis.
Tip
You can try text to speech in Speech Studio without signing up or writing any code.
Prerequisites
- Azure subscription - Create one for free
- Create a Speech resource in the Azure portal.
- Get the Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
Set up the environment
Install the Speech SDK for Go. Check the SDK installation guide for any more requirements.
Set environment variables
Your application must be authenticated to access Cognitive Services resources. For production, use a secure way of storing and accessing your credentials. For example, after you get a key for your Speech resource, write it to a new environment variable on the local machine running the application.
Tip
Don't include the key directly in your code, and never post it publicly. See the Cognitive Services security article for more authentication options like Azure Key Vault.
To set the environment variable for your Speech resource key, open a console window, and follow the instructions for your operating system and development environment.
- To set the
SPEECH_KEY
environment variable, replaceyour-key
with one of the keys for your resource. - To set the
SPEECH_REGION
environment variable, replaceyour-region
with one of the regions for your resource.
setx SPEECH_KEY your-key
setx SPEECH_REGION your-region
Note
If you only need to access the environment variable in the current running console, you can set the environment variable with set
instead of setx
.
After you add the environment variables, you may need to restart any running programs that will need to read the environment variable, including the console window. For example, if you are using Visual Studio as your editor, restart Visual Studio before running the example.
Synthesize to speaker output
Follow these steps to create a new GO module.
Open a command prompt where you want the new module, and create a new file named
speech-synthesis.go
.Copy the following code into
speech_synthesis.go
:package main import ( "bufio" "fmt" "os" "strings" "time" "github.com/Microsoft/cognitive-services-speech-sdk-go/audio" "github.com/Microsoft/cognitive-services-speech-sdk-go/common" "github.com/Microsoft/cognitive-services-speech-sdk-go/speech" ) func synthesizeStartedHandler(event speech.SpeechSynthesisEventArgs) { defer event.Close() fmt.Println("Synthesis started.") } func synthesizingHandler(event speech.SpeechSynthesisEventArgs) { defer event.Close() fmt.Printf("Synthesizing, audio chunk size %d.\n", len(event.Result.AudioData)) } func synthesizedHandler(event speech.SpeechSynthesisEventArgs) { defer event.Close() fmt.Printf("Synthesized, audio length %d.\n", len(event.Result.AudioData)) } func cancelledHandler(event speech.SpeechSynthesisEventArgs) { defer event.Close() fmt.Println("Received a cancellation.") } func main() { // This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION" speechKey := os.Getenv("SPEECH_KEY") speechRegion := os.Getenv("SPEECH_REGION") audioConfig, err := audio.NewAudioConfigFromDefaultSpeakerOutput() if err != nil { fmt.Println("Got an error: ", err) return } defer audioConfig.Close() speechConfig, err := speech.NewSpeechConfigFromSubscription(speechKey, speechRegion) if err != nil { fmt.Println("Got an error: ", err) return } defer speechConfig.Close() speechConfig.SetSpeechSynthesisVoiceName("en-US-JennyNeural") speechSynthesizer, err := speech.NewSpeechSynthesizerFromConfig(speechConfig, audioConfig) if err != nil { fmt.Println("Got an error: ", err) return } defer speechSynthesizer.Close() speechSynthesizer.SynthesisStarted(synthesizeStartedHandler) speechSynthesizer.Synthesizing(synthesizingHandler) speechSynthesizer.SynthesisCompleted(synthesizedHandler) speechSynthesizer.SynthesisCanceled(cancelledHandler) for { fmt.Printf("Enter some text that you want to speak, or enter empty text to exit.\n> ") text, _ := bufio.NewReader(os.Stdin).ReadString('\n') text = strings.TrimSuffix(text, "\n") if len(text) == 0 { break } task := speechSynthesizer.SpeakTextAsync(text) var outcome speech.SpeechSynthesisOutcome select { case outcome = <-task: case <-time.After(60 * time.Second): fmt.Println("Timed out") return } defer outcome.Close() if outcome.Error != nil { fmt.Println("Got an error: ", outcome.Error) return } if outcome.Result.Reason == common.SynthesizingAudioCompleted { fmt.Printf("Speech synthesized to speaker for text [%s].\n", text) } else { cancellation, _ := speech.NewCancellationDetailsFromSpeechSynthesisResult(outcome.Result) fmt.Printf("CANCELED: Reason=%d.\n", cancellation.Reason) if cancellation.Reason == common.Error { fmt.Printf("CANCELED: ErrorCode=%d\nCANCELED: ErrorDetails=[%s]\nCANCELED: Did you set the speech resource key and region values?\n", cancellation.ErrorCode, cancellation.ErrorDetails) } } } }
To change the speech synthesis language, replace
en-US-JennyNeural
with another supported voice. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you setes-ES-ElviraNeural
, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.
Run the following commands to create a go.mod
file that links to components hosted on GitHub:
go mod init speech-synthesis
go get github.com/Microsoft/cognitive-services-speech-sdk-go
Important
Make sure that you set the SPEECH__KEY
and SPEECH__REGION
environment variables as described above. If you don't set these variables, the sample will fail with an error message.
Now build and run the code:
go build
go run speech-synthesis
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.
Reference documentation | Additional Samples on GitHub
In this quickstart, you run an application that does text to speech synthesis.
Tip
You can try text to speech in Speech Studio without signing up or writing any code.
Prerequisites
- Azure subscription - Create one for free
- Create a Speech resource in the Azure portal.
- Get the Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
Set up the environment
Before you can do anything, you need to install the Speech SDK. The sample in this quickstart works with the Java Runtime.
- Install Apache Maven. Then run
mvn -v
to confirm successful installation. - Create a new
pom.xml
file in the root of your project, and copy the following into it:<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.microsoft.cognitiveservices.speech.samples</groupId> <artifactId>quickstart-eclipse</artifactId> <version>1.0.0-SNAPSHOT</version> <build> <sourceDirectory>src</sourceDirectory> <plugins> <plugin> <artifactId>maven-compiler-plugin</artifactId> <version>3.7.0</version> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> </plugins> </build> <dependencies> <dependency> <groupId>com.microsoft.cognitiveservices.speech</groupId> <artifactId>client-sdk</artifactId> <version>1.28.0</version> </dependency> </dependencies> </project>
- Install the Speech SDK and dependencies.
mvn clean dependency:copy-dependencies
Set environment variables
Your application must be authenticated to access Cognitive Services resources. For production, use a secure way of storing and accessing your credentials. For example, after you get a key for your Speech resource, write it to a new environment variable on the local machine running the application.
Tip
Don't include the key directly in your code, and never post it publicly. See the Cognitive Services security article for more authentication options like Azure Key Vault.
To set the environment variable for your Speech resource key, open a console window, and follow the instructions for your operating system and development environment.
- To set the
SPEECH_KEY
environment variable, replaceyour-key
with one of the keys for your resource. - To set the
SPEECH_REGION
environment variable, replaceyour-region
with one of the regions for your resource.
setx SPEECH_KEY your-key
setx SPEECH_REGION your-region
Note
If you only need to access the environment variable in the current running console, you can set the environment variable with set
instead of setx
.
After you add the environment variables, you may need to restart any running programs that will need to read the environment variable, including the console window. For example, if you are using Visual Studio as your editor, restart Visual Studio before running the example.
Synthesize to speaker output
Follow these steps to create a new console application for speech recognition.
Create a new file named
SpeechSynthesis.java
in the same project root directory.Copy the following code into
SpeechSynthesis.java
:import com.microsoft.cognitiveservices.speech.*; import com.microsoft.cognitiveservices.speech.audio.*; import java.util.Scanner; import java.util.concurrent.ExecutionException; public class SpeechSynthesis { // This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION" private static String speechKey = System.getenv("SPEECH_KEY"); private static String speechRegion = System.getenv("SPEECH_REGION"); public static void main(String[] args) throws InterruptedException, ExecutionException { SpeechConfig speechConfig = SpeechConfig.fromSubscription(speechKey, speechRegion); speechConfig.setSpeechSynthesisVoiceName("en-US-JennyNeural"); SpeechSynthesizer speechSynthesizer = new SpeechSynthesizer(speechConfig); // Get text from the console and synthesize to the default speaker. System.out.println("Enter some text that you want to speak >"); String text = new Scanner(System.in).nextLine(); if (text.isEmpty()) { return; } SpeechSynthesisResult speechSynthesisResult = speechSynthesizer.SpeakTextAsync(text).get(); if (speechSynthesisResult.getReason() == ResultReason.SynthesizingAudioCompleted) { System.out.println("Speech synthesized to speaker for text [" + text + "]"); } else if (speechSynthesisResult.getReason() == ResultReason.Canceled) { SpeechSynthesisCancellationDetails cancellation = SpeechSynthesisCancellationDetails.fromResult(speechSynthesisResult); System.out.println("CANCELED: Reason=" + cancellation.getReason()); if (cancellation.getReason() == CancellationReason.Error) { System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode()); System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails()); System.out.println("CANCELED: Did you set the speech resource key and region values?"); } } System.exit(0); } }
To change the speech synthesis language, replace
en-US-JennyNeural
with another supported voice. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you setes-ES-ElviraNeural
, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.
Run your new console application to start speech synthesis to the default speaker.
javac SpeechSynthesis.java -cp ".;target\dependency\*"
java -cp ".;target\dependency\*" SpeechSynthesis
Important
Make sure that you set the SPEECH__KEY
and SPEECH__REGION
environment variables as described above. If you don't set these variables, the sample will fail with an error message.
Enter some text that you want to speak. For example, type "I'm excited to try text to speech." Press the Enter key to hear the synthesized speech.
Enter some text that you want to speak >
I'm excited to try text to speech
Remarks
Now that you've completed the quickstart, here are some additional considerations:
This quickstart uses the SpeakTextAsync
operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:
- For information about speech synthesis from a file and finer control over voice styles, prosody, and other settings, see How to synthesize speech and Improve synthesis with Speech Synthesis Markup Language (SSML).
- For information about synthesizing long-form text to speech, see batch synthesis.
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.
Reference documentation | Package (npm) | Additional Samples on GitHub | Library source code
In this quickstart, you run an application that does text to speech synthesis.
Tip
You can try text to speech in Speech Studio without signing up or writing any code.
Prerequisites
- Azure subscription - Create one for free
- Create a Speech resource in the Azure portal.
- Get the Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
Set up the environment
Before you can do anything, you need to install the Speech SDK for JavaScript. If you just want the package name to install, run npm install microsoft-cognitiveservices-speech-sdk
. For guided installation instructions, see the SDK installation guide.
Set environment variables
Your application must be authenticated to access Cognitive Services resources. For production, use a secure way of storing and accessing your credentials. For example, after you get a key for your Speech resource, write it to a new environment variable on the local machine running the application.
Tip
Don't include the key directly in your code, and never post it publicly. See the Cognitive Services security article for more authentication options like Azure Key Vault.
To set the environment variable for your Speech resource key, open a console window, and follow the instructions for your operating system and development environment.
- To set the
SPEECH_KEY
environment variable, replaceyour-key
with one of the keys for your resource. - To set the
SPEECH_REGION
environment variable, replaceyour-region
with one of the regions for your resource.
setx SPEECH_KEY your-key
setx SPEECH_REGION your-region
Note
If you only need to access the environment variable in the current running console, you can set the environment variable with set
instead of setx
.
After you add the environment variables, you may need to restart any running programs that will need to read the environment variable, including the console window. For example, if you are using Visual Studio as your editor, restart Visual Studio before running the example.
Synthesize to file output
Follow these steps to create a Node.js console application for speech synthesis.
Open a command prompt where you want the new project, and create a new file named
SpeechSynthesis.js
.Install the Speech SDK for JavaScript:
npm install microsoft-cognitiveservices-speech-sdk
Copy the following code into
SpeechSynthesis.js
:(function() { "use strict"; var sdk = require("microsoft-cognitiveservices-speech-sdk"); var readline = require("readline"); var audioFile = "YourAudioFile.wav"; // This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION" const speechConfig = sdk.SpeechConfig.fromSubscription(process.env.SPEECH_KEY, process.env.SPEECH_REGION); const audioConfig = sdk.AudioConfig.fromAudioFileOutput(audioFile); // The language of the voice that speaks. speechConfig.speechSynthesisVoiceName = "en-US-JennyNeural"; // Create the speech synthesizer. var synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig); var rl = readline.createInterface({ input: process.stdin, output: process.stdout }); rl.question("Enter some text that you want to speak >\n> ", function (text) { rl.close(); // Start the synthesizer and wait for a result. synthesizer.speakTextAsync(text, function (result) { if (result.reason === sdk.ResultReason.SynthesizingAudioCompleted) { console.log("synthesis finished."); } else { console.error("Speech synthesis canceled, " + result.errorDetails + "\nDid you set the speech resource key and region values?"); } synthesizer.close(); synthesizer = null; }, function (err) { console.trace("err - " + err); synthesizer.close(); synthesizer = null; }); console.log("Now synthesizing to: " + audioFile); }); }());
In
SpeechSynthesis.js
, optionally you can renameYourAudioFile.wav
to another output filename.To change the speech synthesis language, replace
en-US-JennyNeural
with another supported voice. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you setes-ES-ElviraNeural
, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.
Run your new console application to start speech synthesis to a file:
node.exe SpeechSynthesis.js
Important
Make sure that you set the SPEECH__KEY
and SPEECH__REGION
environment variables as described above. If you don't set these variables, the sample will fail with an error message.
The provided text should be output to an audio file:
Enter some text that you want to speak >
> I'm excited to try text to speech
Now synthesizing to: YourAudioFile.wav
synthesis finished.
Remarks
Now that you've completed the quickstart, here are some additional considerations:
This quickstart uses the SpeakTextAsync
operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:
- For information about speech synthesis from a file and finer control over voice styles, prosody, and other settings, see How to synthesize speech and Improve synthesis with Speech Synthesis Markup Language (SSML).
- For information about synthesizing long-form text to speech, see batch synthesis.
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.
Reference documentation | Package (Download) | Additional Samples on GitHub
In this quickstart, you run an application that does text to speech synthesis.
Tip
You can try text to speech in Speech Studio without signing up or writing any code.
Prerequisites
- Azure subscription - Create one for free
- Create a Speech resource in the Azure portal.
- Get the Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
Set up the environment
The Speech SDK for Objective-C is distributed as a framework bundle. The framework supports both Objective-C and Swift on both iOS and macOS.
The Speech SDK can be used in Xcode projects as a CocoaPod, or downloaded directly here and linked manually. This guide uses a CocoaPod. Install the CocoaPod dependency manager as described in its installation instructions.
Set environment variables
Your application must be authenticated to access Cognitive Services resources. For production, use a secure way of storing and accessing your credentials. For example, after you get a key for your Speech resource, write it to a new environment variable on the local machine running the application.
Tip
Don't include the key directly in your code, and never post it publicly. See the Cognitive Services security article for more authentication options like Azure Key Vault.
To set the environment variable for your Speech resource key, open a console window, and follow the instructions for your operating system and development environment.
- To set the
SPEECH_KEY
environment variable, replaceyour-key
with one of the keys for your resource. - To set the
SPEECH_REGION
environment variable, replaceyour-region
with one of the regions for your resource.
setx SPEECH_KEY your-key
setx SPEECH_REGION your-region
Note
If you only need to access the environment variable in the current running console, you can set the environment variable with set
instead of setx
.
After you add the environment variables, you may need to restart any running programs that will need to read the environment variable, including the console window. For example, if you are using Visual Studio as your editor, restart Visual Studio before running the example.
Synthesize to speaker output
Follow these steps to synthesize speech in a macOS application.
Clone the Azure-Samples/cognitive-services-speech-sdk repository to get the Synthesize audio in Objective-C on macOS using the Speech SDK sample project. The repository also has iOS samples.
Navigate to the directory of the downloaded sample app (
helloworld
) in a terminal.Run the command
pod install
. This will generate ahelloworld.xcworkspace
Xcode workspace containing both the sample app and the Speech SDK as a dependency.Open the
helloworld.xcworkspace
workspace in Xcode.Open the file named
AppDelegate.m
and locate thebuttonPressed
method as shown here.- (void)buttonPressed:(NSButton *)button { // Creates an instance of a speech config with specified subscription key and service region. NSString *speechKey = [[[NSProcessInfo processInfo] environment] objectForKey:@"SPEECH_KEY"]; NSString *serviceRegion = [[[NSProcessInfo processInfo] environment] objectForKey:@"SPEECH_REGION"]; SPXSpeechConfiguration *speechConfig = [[SPXSpeechConfiguration alloc] initWithSubscription:speechKey region:serviceRegion]; speechConfig.speechSynthesisVoiceName = @"en-US-JennyNeural"; SPXSpeechSynthesizer *speechSynthesizer = [[SPXSpeechSynthesizer alloc] init:speechConfig]; NSLog(@"Start synthesizing..."); SPXSpeechSynthesisResult *speechResult = [speechSynthesizer speakText:[self.textField stringValue]]; // Checks result. if (SPXResultReason_Canceled == speechResult.reason) { SPXSpeechSynthesisCancellationDetails *details = [[SPXSpeechSynthesisCancellationDetails alloc] initFromCanceledSynthesisResult:speechResult]; NSLog(@"Speech synthesis was canceled: %@. Did you set the speech resource key and region values?", details.errorDetails); } else if (SPXResultReason_SynthesizingAudioCompleted == speechResult.reason) { NSLog(@"Speech synthesis was completed"); } else { NSLog(@"There was an error."); } }
In
AppDelegate.m
, use the environment variables that you previously set for your Speech resource key and region.NSString *speechKey = [[[NSProcessInfo processInfo] environment] objectForKey:@"SPEECH_KEY"]; NSString *serviceRegion = [[[NSProcessInfo processInfo] environment] objectForKey:@"SPEECH_REGION"];
Optionally in
AppDelegate.m
, include a speech synthesis voice name as shown here:speechConfig.speechSynthesisVoiceName = @"en-US-JennyNeural";
To change the speech synthesis language, replace
en-US-JennyNeural
with another supported voice. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you setes-ES-ElviraNeural
, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.Make the debug output visible (View > Debug Area > Activate Console).
Build and run the example code by selecting Product > Run from the menu or selecting the Play button.
Important
Make sure that you set the SPEECH__KEY
and SPEECH__REGION
environment variables as described above. If you don't set these variables, the sample will fail with an error message.
After you input some text and select the button in the app, you should hear the synthesized audio played.
Remarks
Now that you've completed the quickstart, here are some additional considerations:
This quickstart uses the SpeakText
operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:
- For information about speech synthesis from a file and finer control over voice styles, prosody, and other settings, see How to synthesize speech and Improve synthesis with Speech Synthesis Markup Language (SSML).
- For information about synthesizing long-form text to speech, see batch synthesis.
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.
Reference documentation | Package (Download) | Additional Samples on GitHub
In this quickstart, you run an application that does text to speech synthesis.
Tip
You can try text to speech in Speech Studio without signing up or writing any code.
Prerequisites
- Azure subscription - Create one for free
- Create a Speech resource in the Azure portal.
- Get the Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
Set up the environment
The Speech SDK for Swift is distributed as a framework bundle. The framework supports both Objective-C and Swift on both iOS and macOS.
The Speech SDK can be used in Xcode projects as a CocoaPod, or downloaded directly here and linked manually. This guide uses a CocoaPod. Install the CocoaPod dependency manager as described in its installation instructions.
Set environment variables
Your application must be authenticated to access Cognitive Services resources. For production, use a secure way of storing and accessing your credentials. For example, after you get a key for your Speech resource, write it to a new environment variable on the local machine running the application.
Tip
Don't include the key directly in your code, and never post it publicly. See the Cognitive Services security article for more authentication options like Azure Key Vault.
To set the environment variable for your Speech resource key, open a console window, and follow the instructions for your operating system and development environment.
- To set the
SPEECH_KEY
environment variable, replaceyour-key
with one of the keys for your resource. - To set the
SPEECH_REGION
environment variable, replaceyour-region
with one of the regions for your resource.
setx SPEECH_KEY your-key
setx SPEECH_REGION your-region
Note
If you only need to access the environment variable in the current running console, you can set the environment variable with set
instead of setx
.
After you add the environment variables, you may need to restart any running programs that will need to read the environment variable, including the console window. For example, if you are using Visual Studio as your editor, restart Visual Studio before running the example.
Synthesize to speaker output
Follow these steps to synthesize speech in a macOS application.
Clone the Azure-Samples/cognitive-services-speech-sdk repository to get the Synthesize audio in Swift on macOS using the Speech SDK sample project. The repository also has iOS samples.
Navigate to the directory of the downloaded sample app (
helloworld
) in a terminal.Run the command
pod install
. This will generate ahelloworld.xcworkspace
Xcode workspace containing both the sample app and the Speech SDK as a dependency.Open the
helloworld.xcworkspace
workspace in Xcode.Open the file named
AppDelegate.swift
and locate theapplicationDidFinishLaunching
andsynthesize
methods as shown here.import Cocoa @NSApplicationMain class AppDelegate: NSObject, NSApplicationDelegate, NSTextFieldDelegate { var textField: NSTextField! var synthesisButton: NSButton! var inputText: String! var sub: String! var region: String! @IBOutlet weak var window: NSWindow! func applicationDidFinishLaunching(_ aNotification: Notification) { print("loading") // load subscription information sub = ProcessInfo.processInfo.environment["SPEECH_KEY"] region = ProcessInfo.processInfo.environment["SPEECH_REGION"] inputText = "" textField = NSTextField(frame: NSRect(x: 100, y: 200, width: 200, height: 50)) textField.textColor = NSColor.black textField.lineBreakMode = .byWordWrapping textField.placeholderString = "Type something to synthesize." textField.delegate = self self.window.contentView?.addSubview(textField) synthesisButton = NSButton(frame: NSRect(x: 100, y: 100, width: 200, height: 30)) synthesisButton.title = "Synthesize" synthesisButton.target = self synthesisButton.action = #selector(synthesisButtonClicked) self.window.contentView?.addSubview(synthesisButton) } @objc func synthesisButtonClicked() { DispatchQueue.global(qos: .userInitiated).async { self.synthesize() } } func synthesize() { var speechConfig: SPXSpeechConfiguration? do { try speechConfig = SPXSpeechConfiguration(subscription: sub, region: region) } catch { print("error \(error) happened") speechConfig = nil } speechConfig?.speechSynthesisVoiceName = "en-US-JennyNeural"; let synthesizer = try! SPXSpeechSynthesizer(speechConfig!) let result = try! synthesizer.speakText(inputText) if result.reason == SPXResultReason.canceled { let cancellationDetails = try! SPXSpeechSynthesisCancellationDetails(fromCanceledSynthesisResult: result) print("cancelled, error code: \(cancellationDetails.errorCode) detail: \(cancellationDetails.errorDetails!) ") print("Did you set the speech resource key and region values?"); return } } func controlTextDidChange(_ obj: Notification) { let textFiled = obj.object as! NSTextField inputText = textFiled.stringValue } }
In
AppDelegate.m
, use the environment variables that you previously set for your Speech resource key and region.sub = ProcessInfo.processInfo.environment["SPEECH_KEY"] region = ProcessInfo.processInfo.environment["SPEECH_REGION"]
Optionally in
AppDelegate.m
, include a speech synthesis voice name as shown here:speechConfig?.speechSynthesisVoiceName = "en-US-JennyNeural";
To change the speech synthesis language, replace
en-US-JennyNeural
with another supported voice. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you setes-ES-ElviraNeural
, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.Make the debug output visible by selecting View > Debug Area > Activate Console.
Build and run the example code by selecting Product > Run from the menu or selecting the Play button.
Important
Make sure that you set the SPEECH__KEY
and SPEECH__REGION
environment variables as described above. If you don't set these variables, the sample will fail with an error message.
After you input some text and select the button in the app, you should hear the synthesized audio played.
Remarks
Now that you've completed the quickstart, here are some additional considerations:
This quickstart uses the SpeakText
operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:
- For information about speech synthesis from a file and finer control over voice styles, prosody, and other settings, see How to synthesize speech and Improve synthesis with Speech Synthesis Markup Language (SSML).
- For information about synthesizing long-form text to speech, see batch synthesis.
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.
Reference documentation | Package (PyPi) | Additional Samples on GitHub
In this quickstart, you run an application that does text to speech synthesis.
Tip
You can try text to speech in Speech Studio without signing up or writing any code.
Prerequisites
- Azure subscription - Create one for free
- Create a Speech resource in the Azure portal.
- Get the Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
Set up the environment
The Speech SDK for Python is available as a Python Package Index (PyPI) module. The Speech SDK for Python is compatible with Windows, Linux, and macOS.
- You must install the Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017, 2019, and 2022 for your platform. Installing this package for the first time might require a restart.
- On Linux, you must use the x64 target architecture.
Install a version of Python from 3.7 or later. First check the SDK installation guide for any more requirements
Set environment variables
Your application must be authenticated to access Cognitive Services resources. For production, use a secure way of storing and accessing your credentials. For example, after you get a key for your Speech resource, write it to a new environment variable on the local machine running the application.
Tip
Don't include the key directly in your code, and never post it publicly. See the Cognitive Services security article for more authentication options like Azure Key Vault.
To set the environment variable for your Speech resource key, open a console window, and follow the instructions for your operating system and development environment.
- To set the
SPEECH_KEY
environment variable, replaceyour-key
with one of the keys for your resource. - To set the
SPEECH_REGION
environment variable, replaceyour-region
with one of the regions for your resource.
setx SPEECH_KEY your-key
setx SPEECH_REGION your-region
Note
If you only need to access the environment variable in the current running console, you can set the environment variable with set
instead of setx
.
After you add the environment variables, you may need to restart any running programs that will need to read the environment variable, including the console window. For example, if you are using Visual Studio as your editor, restart Visual Studio before running the example.
Synthesize to speaker output
Follow these steps to create a new console application.
Open a command prompt where you want the new project, and create a new file named
speech_synthesis.py
.Run this command to install the Speech SDK:
pip install azure-cognitiveservices-speech
Copy the following code into
speech_synthesis.py
:import os import azure.cognitiveservices.speech as speechsdk # This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION" speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), region=os.environ.get('SPEECH_REGION')) audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True) # The language of the voice that speaks. speech_config.speech_synthesis_voice_name='en-US-JennyNeural' speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config) # Get text from the console and synthesize to the default speaker. print("Enter some text that you want to speak >") text = input() speech_synthesis_result = speech_synthesizer.speak_text_async(text).get() if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted: print("Speech synthesized for text [{}]".format(text)) elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled: cancellation_details = speech_synthesis_result.cancellation_details print("Speech synthesis canceled: {}".format(cancellation_details.reason)) if cancellation_details.reason == speechsdk.CancellationReason.Error: if cancellation_details.error_details: print("Error details: {}".format(cancellation_details.error_details)) print("Did you set the speech resource key and region values?")
To change the speech synthesis language, replace
en-US-JennyNeural
with another supported voice. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you setes-ES-ElviraNeural
, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.
Run your new console application to start speech synthesis to the default speaker.
python speech_synthesis.py
Important
Make sure that you set the SPEECH__KEY
and SPEECH__REGION
environment variables as described above. If you don't set these variables, the sample will fail with an error message.
Enter some text that you want to speak. For example, type "I'm excited to try text to speech." Press the Enter key to hear the synthesized speech.
Enter some text that you want to speak >
I'm excited to try text to speech
Remarks
Now that you've completed the quickstart, here are some additional considerations:
This quickstart uses the speak_text_async
operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:
- For information about speech synthesis from a file and finer control over voice styles, prosody, and other settings, see How to synthesize speech and Improve synthesis with Speech Synthesis Markup Language (SSML).
- For information about synthesizing long-form text to speech, see batch synthesis.
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.
Speech to text REST API reference | Speech to text REST API for short audio reference | Additional Samples on GitHub
In this quickstart, you run an application that does text to speech synthesis.
Tip
You can try text to speech in Speech Studio without signing up or writing any code.
Prerequisites
- Azure subscription - Create one for free
- Create a Speech resource in the Azure portal.
- Get the Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
Set environment variables
Your application must be authenticated to access Cognitive Services resources. For production, use a secure way of storing and accessing your credentials. For example, after you get a key for your Speech resource, write it to a new environment variable on the local machine running the application.
Tip
Don't include the key directly in your code, and never post it publicly. See the Cognitive Services security article for more authentication options like Azure Key Vault.
To set the environment variable for your Speech resource key, open a console window, and follow the instructions for your operating system and development environment.
- To set the
SPEECH_KEY
environment variable, replaceyour-key
with one of the keys for your resource. - To set the
SPEECH_REGION
environment variable, replaceyour-region
with one of the regions for your resource.
setx SPEECH_KEY your-key
setx SPEECH_REGION your-region
Note
If you only need to access the environment variable in the current running console, you can set the environment variable with set
instead of setx
.
After you add the environment variables, you may need to restart any running programs that will need to read the environment variable, including the console window. For example, if you are using Visual Studio as your editor, restart Visual Studio before running the example.
Synthesize to a file
At a command prompt, run the following cURL command. Optionally you can rename output.mp3
to another output filename.
Choose your target environment
curl --location --request POST "https://%SPEECH_REGION%.tts.speech.microsoft.com/cognitiveservices/v1" ^
--header "Ocp-Apim-Subscription-Key: %SPEECH_KEY%" ^
--header "Content-Type: application/ssml+xml" ^
--header "X-Microsoft-OutputFormat: audio-16khz-128kbitrate-mono-mp3" ^
--header "User-Agent: curl" ^
--data-raw "<speak version='1.0' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Female' name='en-US-JennyNeural'>my voice is my passport verify me</voice></speak>" --output output.mp3
Important
Make sure that you set the SPEECH__KEY
and SPEECH__REGION
environment variables as described above. If you don't set these variables, the sample will fail with an error message.
The provided text should be output to an audio file named output.mp3.
To change the speech synthesis language, replace en-US-JennyNeural
with another supported voice. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you set es-ES-ElviraNeural
, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.
For more information, see Text to speech REST API.
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.
In this quickstart, you run an application that does text to speech synthesis.
Tip
You can try text to speech in Speech Studio without signing up or writing any code.
Prerequisites
- Azure subscription - Create one for free
- Create a Speech resource in the Azure portal.
- Get the Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
Set up the environment
Follow these steps and see the Speech CLI quickstart for additional requirements for your platform.
Install the Speech CLI via the .NET CLI by entering this command:
dotnet tool install --global Microsoft.CognitiveServices.Speech.CLI
Configure your Speech resource key and region, by running the following commands. Replace
SUBSCRIPTION-KEY
with your Speech resource key, and replaceREGION
with your Speech resource region:spx config @key --set SUBSCRIPTION-KEY spx config @region --set REGION
Synthesize to speaker output
Run the following command for speech synthesis to the default speaker output. You can modify the text to be synthesized and the voice.
spx synthesize --text "I'm excited to try text to speech" --voice "en-US-JennyNeural"
If you don't set a voice name, the default voice for en-US
will speak. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you set --voice "es-ES-ElviraNeural"
, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.
Remarks
Now that you've completed the quickstart, here are some additional considerations:
You can have finer control over voice styles, prosody, and other settings by using Speech Synthesis Markup Language (SSML).
In the following example, the voice and style ('excited') are provided in the SSML block.
spx synthesize --ssml "<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='https://www.w3.org/2001/mstts' xml:lang='en-US'><voice name='en-US-JennyNeural'><mstts:express-as style='excited'>I'm excited to try text to speech</mstts:express-as></voice></speak>"
Run this command for information about additional speech synthesis options such as file input and output:
spx help synthesize
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.
Next steps
Feedback
Submit and view feedback for