Events
17 Mar, 23 - 21 Mar, 23
Join the meetup series to build scalable AI solutions based on real-world use cases with fellow developers and experts.
Register nowThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
In this tutorial, add Azure AI Speech to an existing Express.js app to add conversion from text to speech using the Azure AI Speech service. Converting text to speech allows you to provide audio without the cost of manually generating the audio.
This tutorial shows 3 different ways to convert text to speech from Azure Azure AI Speech:
The tutorial takes a minimal Express.js app and adds functionality using a combination of:
This application provides three different calls to convert speech to text:
Node.js LTS - installed to your local machine.
Visual Studio Code - installed to your local machine.
The Azure App Service extension for VS Code (installed from within VS Code).
Git - used to push to GitHub - which activates the GitHub action.
Use Azure Cloud Shell using the bash
If you prefer, install the Azure CLI to run CLI reference commands.
Using git, clone the Express.js sample repo to your local computer.
git clone https://github.com/Azure-Samples/js-e2e-express-server
Change to the new directory for the sample.
cd js-e2e-express-server
Open the project in Visual Studio Code.
code .
Open a new terminal in Visual Studio Code and install the project dependencies.
npm install
From the Visual Studio Code terminal, install the Azure AI Speech SDK.
npm install microsoft-cognitiveservices-speech-sdk
To integrate the Speech SDK into the Express.js application, create a file in the src
folder named azure-cognitiveservices-speech.js
.
Add the following code to pull in dependencies and create a function to convert text to speech.
// azure-cognitiveservices-speech.js
const sdk = require('microsoft-cognitiveservices-speech-sdk');
const { Buffer } = require('buffer');
const { PassThrough } = require('stream');
const fs = require('fs');
/**
* Node.js server code to convert text to speech
* @returns stream
* @param {*} key your resource key
* @param {*} region your resource region
* @param {*} text text to convert to audio/speech
* @param {*} filename optional - best for long text - temp file for converted speech/audio
*/
const textToSpeech = async (key, region, text, filename)=> {
// convert callback function to promise
return new Promise((resolve, reject) => {
const speechConfig = sdk.SpeechConfig.fromSubscription(key, region);
speechConfig.speechSynthesisOutputFormat = 5; // mp3
let audioConfig = null;
if (filename) {
audioConfig = sdk.AudioConfig.fromAudioFileOutput(filename);
}
const synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);
synthesizer.speakTextAsync(
text,
result => {
const { audioData } = result;
synthesizer.close();
if (filename) {
// return stream from file
const audioFile = fs.createReadStream(filename);
resolve(audioFile);
} else {
// return stream from memory
const bufferStream = new PassThrough();
bufferStream.end(Buffer.from(audioData));
resolve(bufferStream);
}
},
error => {
synthesizer.close();
reject(error);
});
});
};
module.exports = {
textToSpeech
};
textToSpeech
function takes four arguments. If a file name with local path is sent, the text is converted to an audio file. If a file name is not sent, an in-memory audio stream is created.The local method, textToSpeech
, wraps and converts the SDK call-back function into a promise.
Open the src/server.js
file.
Add the azure-cognitiveservices-speech.js
module as a dependency at the top of the file:
const { textToSpeech } = require('./azure-cognitiveservices-speech');
Add a new API route to call the textToSpeech method created in the previous section of the tutorial. Add this code after the /api/hello
route.
// creates a temp file on server, the streams to client
/* eslint-disable no-unused-vars */
app.get('/text-to-speech', async (req, res, next) => {
const { key, region, phrase, file } = req.query;
if (!key || !region || !phrase) res.status(404).send('Invalid query string');
let fileName = null;
// stream from file or memory
if (file && file === true) {
fileName = `./temp/stream-from-file-${timeStamp()}.mp3`;
}
const audioStream = await textToSpeech(key, region, phrase, fileName);
res.set({
'Content-Type': 'audio/mpeg',
'Transfer-Encoding': 'chunked'
});
audioStream.pipe(res);
});
This method takes the required and optional parameters for the textToSpeech
method from the querystring. If a file needs to be created, a unique file name is developed. The textToSpeech
method is called asynchronously and pipes the result to the response (res
) object.
Update the client HTML web page with a form that collects the required parameters. The optional parameter is passed in based on which audio control the user selects. Because this tutorial provides a mechanism to call the Azure Speech service from the client, that JavaScript is also provided.
Open the /public/client.html
file and replace its contents with the following:
<!DOCTYPE html>
<html lang="en">
<head>
<title>Microsoft Cognitive Services Demo</title>
<meta charset="utf-8" />
</head>
<body>
<div id="content" style="display:none">
<h1 style="font-weight:500;">Microsoft Cognitive Services Speech </h1>
<h2>npm: microsoft-cognitiveservices-speech-sdk</h2>
<table width="100%">
<tr>
<td></td>
<td>
<a href="https://docs.microsoft.com/azure/cognitive-services/speech-service/get-started" target="_blank">Azure
Cognitive Services Speech Documentation</a>
</td>
</tr>
<tr>
<td align="right">Your Speech Resource Key</td>
<td>
<input id="resourceKey" type="text" size="40" placeholder="Your resource key (32 characters)" value=""
onblur="updateSrc()">
</tr>
<tr>
<td align="right">Your Speech Resource region</td>
<td>
<input id="resourceRegion" type="text" size="40" placeholder="Your resource region" value="eastus"
onblur="updateSrc()">
</td>
</tr>
<tr>
<td align="right" valign="top">Input Text (max 255 char)</td>
<td><textarea id="phraseDiv" style="display: inline-block;width:500px;height:50px" maxlength="255"
onblur="updateSrc()">all good men must come to the aid</textarea></td>
</tr>
<tr>
<td align="right">
Stream directly from Azure Cognitive Services
</td>
<td>
<div>
<button id="clientAudioAzure" onclick="getSpeechFromAzure()">Get directly from Azure</button>
</div>
</td>
</tr>
<tr>
<td align="right">
Stream audio from file on server</td>
<td>
<audio id="serverAudioFile" controls preload="none" onerror="DisplayError()">
</audio>
</td>
</tr>
<tr>
<td align="right">Stream audio from buffer on server</td>
<td>
<audio id="serverAudioStream" controls preload="none" onerror="DisplayError()">
</audio>
</td>
</tr>
</table>
</div>
<!-- Speech SDK reference sdk. -->
<script
src="https://cdn.jsdelivr.net/npm/microsoft-cognitiveservices-speech-sdk@latest/distrib/browser/microsoft.cognitiveservices.speech.sdk.bundle-min.js">
</script>
<!-- Speech SDK USAGE -->
<script>
// status fields and start button in UI
var phraseDiv;
var resultDiv;
// subscription key and region for speech services.
var resourceKey = null;
var resourceRegion = "eastus";
var authorizationToken;
var SpeechSDK;
var synthesizer;
var phrase = "all good men must come to the aid"
var queryString = null;
var audioType = "audio/mpeg";
var serverSrc = "/text-to-speech";
document.getElementById('serverAudioStream').disabled = true;
document.getElementById('serverAudioFile').disabled = true;
document.getElementById('clientAudioAzure').disabled = true;
// update src URL query string for Express.js server
function updateSrc() {
// input values
resourceKey = document.getElementById('resourceKey').value.trim();
resourceRegion = document.getElementById('resourceRegion').value.trim();
phrase = document.getElementById('phraseDiv').value.trim();
// server control - by file
var serverAudioFileControl = document.getElementById('serverAudioFile');
queryString += `%file=true`;
const fileQueryString = `file=true®ion=${resourceRegion}&key=${resourceKey}&phrase=${phrase}`;
serverAudioFileControl.src = `${serverSrc}?${fileQueryString}`;
console.log(serverAudioFileControl.src)
serverAudioFileControl.type = "audio/mpeg";
serverAudioFileControl.disabled = false;
// server control - by stream
var serverAudioStreamControl = document.getElementById('serverAudioStream');
const streamQueryString = `region=${resourceRegion}&key=${resourceKey}&phrase=${phrase}`;
serverAudioStreamControl.src = `${serverSrc}?${streamQueryString}`;
console.log(serverAudioStreamControl.src)
serverAudioStreamControl.type = "audio/mpeg";
serverAudioStreamControl.disabled = false;
// client control
var clientAudioAzureControl = document.getElementById('clientAudioAzure');
clientAudioAzureControl.disabled = false;
}
function DisplayError(error) {
window.alert(JSON.stringify(error));
}
// Client-side request directly to Azure Cognitive Services
function getSpeechFromAzure() {
// authorization for Speech service
var speechConfig = SpeechSDK.SpeechConfig.fromSubscription(resourceKey, resourceRegion);
// new Speech object
synthesizer = new SpeechSDK.SpeechSynthesizer(speechConfig);
synthesizer.speakTextAsync(
phrase,
function (result) {
// Success function
// display status
if (result.reason === SpeechSDK.ResultReason.SynthesizingAudioCompleted) {
// load client-side audio control from Azure response
audioElement = document.getElementById("clientAudioAzure");
const blob = new Blob([result.audioData], { type: "audio/mpeg" });
const url = window.URL.createObjectURL(blob);
} else if (result.reason === SpeechSDK.ResultReason.Canceled) {
// display Error
throw (result.errorDetails);
}
// clean up
synthesizer.close();
synthesizer = undefined;
},
function (err) {
// Error function
throw (err);
audioElement = document.getElementById("audioControl");
audioElement.disabled = true;
// clean up
synthesizer.close();
synthesizer = undefined;
});
}
// Initialization
document.addEventListener("DOMContentLoaded", function () {
var clientAudioAzureControl = document.getElementById("clientAudioAzure");
var resultDiv = document.getElementById("resultDiv");
resourceKey = document.getElementById('resourceKey').value;
resourceRegion = document.getElementById('resourceRegion').value;
phrase = document.getElementById('phraseDiv').value;
if (!!window.SpeechSDK) {
SpeechSDK = window.SpeechSDK;
clientAudioAzure.disabled = false;
document.getElementById('content').style.display = 'block';
}
});
</script>
</body>
</html>
Highlighted lines in the file:
cdn.jsdelivr.net
site to deliver the NPM package.updateSrc
method updates the audio controls' src
URL with the querystring including the key, region, and text.Get directly from Azure
button, the web page calls directly to Azure from the client page and processes the result.Create the Speech resource with Azure CLI commands in an Azure Cloud Shell.
Log in to the Azure Cloud Shell. This requires you to authenticate in a browser with your account, which has permission on a valid Azure Subscription.
Create a resource group for your Speech resource.
az group create \
--location eastus \
--name tutorial-resource-group-eastus
Create a Speech resource in the resource group.
az cognitiveservices account create \
--kind SpeechServices \
--location eastus \
--name tutorial-speech \
--resource-group tutorial-resource-group-eastus \
--sku F0
This command will fail if your only free Speech resource has already been created.
Use the command to get the key values for the new Speech resource.
az cognitiveservices account keys list \
--name tutorial-speech \
--resource-group tutorial-resource-group-eastus \
--output table
Copy one of the keys.
You use the key by pasting it into the web form of the Express app to authenticate to the Azure Speech service.
Start the app with the following bash command.
npm start
Open the web app in a browser.
http://localhost:3000
Paste your Speech key into the highlighted text box.
Optionally, change the text to something new.
Select one of the three buttons to begin the conversion to the audio format:
You may notice a small delay between selecting the control and the audio playing.
From the command palette (Ctrl+Shift+P), type "create web" and select Azure App Service: Create New Web App...Advanced. You use the advanced command to have full control over the deployment including resource group, App Service Plan, and operating system rather than use Linux defaults.
Respond to the prompts as follows:
my-text-to-speech-app
.
tutorial-resource-group-eastus
for the resource group.Node
and LTS
.my-text-to-speech-app-plan
.Basic
tier.eastus
location.After a short time, Visual Studio Code notifies you that creation is complete. Close the notification with the X button.
With the web app in place, deploy your code from the local computer. Select the Azure icon to open the Azure App Service explorer, expand your subscription node, right-click the name of the web app you just created, and select Deploy to Web App.
If there are deployment prompts, select the root folder of the Express.js app, select your subscription account again and then select the name of the web app, my-text-to-speech-app
, created earlier.
If prompted to run npm install
when deploying to Linux, select Yes if prompted to update your configuration to run npm install
on the target server.
Once deployment is complete, select Browse Website in the prompt to view your freshly deployed web app.
(Optional): You can make changes to your code files, then use the Deploy to Web App, in the Azure App service extension, to update the web app.
View (tail) any output that the running app generates through calls to console.log
. This output appears in the Output window in Visual Studio Code.
In the Azure App Service explorer, right-click your new app node and choose Start Streaming Logs.
Starting Live Log Stream ---
Refresh the web page a few times in the browser to see additional log output.
Once you have completed this tutorial, you need to remove the resource group, which includes the resource, to make sure you are not billed for any more usage.
In the Azure Cloud Shell, use the Azure CLI command to delete the resource group:
az group delete --name tutorial-resource-group-eastus -y
This command may take a few minutes.
Events
17 Mar, 23 - 21 Mar, 23
Join the meetup series to build scalable AI solutions based on real-world use cases with fellow developers and experts.
Register nowTraining
Module
Create your first Azure AI services text to speech application - Training
In this module, you'll learn how to use Azure AI services to create a text to speech application.
Certification
Microsoft Certified: Azure Developer Associate - Certifications
Build end-to-end solutions in Microsoft Azure to create Azure Functions, implement and manage web apps, develop solutions utilizing Azure storage, and more.
Documentation
Language support - Speech service - Azure AI services
The Speech service supports numerous languages for speech to text and text to speech conversion, along with speech translation. This article provides a comprehensive list of language support by service feature.
Install the Speech SDK - Azure AI services
In this quickstart, you learn how to install the Speech SDK for your preferred programming language.
Text to speech API reference (REST) - Speech service - Azure AI services
Learn how to use the REST API to convert text into synthesized speech.