アプリExpress.js Azure AI Speech を使用してテキストを音声に変換する

[アーティクル]
01/18/2024

このチュートリアルでは、Azure AI Speech を既存のExpress.js アプリに追加し、Azure AI Speech サービスを使用してテキストから音声への変換を追加します。テキストを音声に変換することにより、オーディオを手動で生成するコストをかけずに、オーディオを提供することができます。

このチュートリアルでは、Azure AI Speech からテキストを音声に変換する 3 つの異なる方法を示します。

クライアントの JavaScript でオーディオを直接取得する
サーバーの JavaScript でファイルからオーディオを取得する (*.MP3)
サーバーの JavaScript でインメモリ arrayBuffer からオーディオを取得する

アプリケーションのアーキテクチャ

このチュートリアルでは、次の組み合わせを使用することにより、最小限の Express.js アプリを使用して機能を追加します。

サーバー API によってテキストから音声への返還を行い、MP3 ストリームを返すための新しいルート
HTML フォームによって情報を入力できるようにするための新しいルート
JavaScript を使用した新しい HTML フォーム。クライアント側で Speech サービスを呼び出すことができます

このアプリケーションには、音声テキスト変換を行う、次の 3 つの異なる呼び出しが用意されています。

最初のサーバー呼び出しは、サーバー上にファイルを作成し、そのファイルをクライアントに返すためのものです。この呼び出しは、通常、長いテキストや複数回使用するとわかっているテキストに使用します。
2 番目のサーバー呼び出しは短期間のテキストのためのもので、クライアントに返される前にメモリ内に保持されます。
クライアント呼び出しは、SDK を使用した Speech サービスへの直接呼び出しを示すためのものです。サーバーを使用しないクライアントのみのアプリケーションがある場合は、この呼び出しを選択できます。

前提条件

Node.js LTS - ローカルのコンピューターにインストールされています。
Visual Studio Code - ローカルコンピューターにインストール済み。
VS Code 用の Azure App Service 拡張機能 (VS Code 内からインストール)。
Git - GitHub にプッシュするために使用されます。これにより GitHub アクションがアクティブになります。
bash を使用して Azure Cloud Shell を使用する
必要に応じて、Azure CLI をインストールして、CLI リファレンスコマンドを実行します。
- ローカルインストールを使用する場合は、az login コマンドを使用して Azure CLI でサインインします。認証プロセスを完了するには、ターミナルに表示される手順に従います。サインインオプションの詳細については、「Azure CLI を使用してサインインする」を参照してください。
- 初回使用時にインストールを求められたら、Azure CLI 拡張機能をインストールします。拡張機能の詳細については、「Azure CLI で拡張機能を使用する」を参照してください。
- az version を実行し、インストールされているバージョンおよび依存ライブラリを検索します。最新バージョンにアップグレードするには、az upgrade を実行します。

Express.js のサンプルリポジトリをダウンロードする

git を使用して、Express.js のサンプルリポジトリをローカルコンピューターにクローンします。
```
git clone https://github.com/Azure-Samples/js-e2e-express-server
```
サンプル用の新しいディレクトリに移動します。
```
cd js-e2e-express-server
```
Visual Studio Code でプロジェクトを開きます。
```
code .
```
Visual Studio Code で新しいターミナルを開き、プロジェクトの依存関係をインストールします。
```
npm install
```

Azure AI Speech SDK for JavaScript をインストールする

Visual Studio Code ターミナルから、Azure AI Speech SDK をインストールします。

npm install microsoft-cognitiveservices-speech-sdk

Express.js アプリ用の Speech モジュールを作成する

Speech SDK を Express.js アプリケーションに統合するため、src フォルダーに azure-cognitiveservices-speech.js という名前のファイルを作成します。

次のコードを追加して、依存関係を取得し、テキストを音声に変換するための関数を作成します。

// azure-cognitiveservices-speech.js

const sdk = require('microsoft-cognitiveservices-speech-sdk');
const { Buffer } = require('buffer');
const { PassThrough } = require('stream');
const fs = require('fs');

/**
 * Node.js server code to convert text to speech
 * @returns stream
 * @param {*} key your resource key
 * @param {*} region your resource region
 * @param {*} text text to convert to audio/speech
 * @param {*} filename optional - best for long text - temp file for converted speech/audio
 */
const textToSpeech = async (key, region, text, filename)=> {
    
    // convert callback function to promise
    return new Promise((resolve, reject) => {
        
        const speechConfig = sdk.SpeechConfig.fromSubscription(key, region);
        speechConfig.speechSynthesisOutputFormat = 5; // mp3
        
        let audioConfig = null;
        
        if (filename) {
            audioConfig = sdk.AudioConfig.fromAudioFileOutput(filename);
        }
        
        const synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);

        synthesizer.speakTextAsync(
            text,
            result => {
                
                const { audioData } = result;

                synthesizer.close();
                
                if (filename) {
                    
                    // return stream from file
                    const audioFile = fs.createReadStream(filename);
                    resolve(audioFile);
                    
                } else {
                    
                    // return stream from memory
                    const bufferStream = new PassThrough();
                    bufferStream.end(Buffer.from(audioData));
                    resolve(bufferStream);
                }
            },
            error => {
                synthesizer.close();
                reject(error);
            }); 
    });
};

module.exports = {
    textToSpeech
};

パラメーター - このファイルは、SDK、ストリーム、バッファー、ファイルシステム (fs) を使用するための依存関係を取得します。 textToSpeech 関数は、4 つの引数を受け取ります。ローカルパスを含むファイル名が送信された場合、テキストはオーディオファイルに変換されます。ファイル名が送信されない場合は、インメモリオーディオストリームが作成されます。
Speech SDK メソッド - Speech SDK メソッド synthesizer.speakTextAsync は、受信した構成に基づいて異なる型を返します。このメソッドは結果を返します。返される結果は、メソッドで実行するように要求された内容によって異なります。
- ファイルを作成する
- インメモリストリームをバッファーの配列として作成する
オーディオ形式 - 選択されているオーディオ形式は MP3 ですが、別の形式や、別のオーディオ構成メソッドも存在します。

ローカルメソッド textToSpeech は、SDK のコールバック関数をラップし、Promise に変換します。

Express.js アプリ用の新しいルートを作成する

src/server.js ファイルを開きます。
ファイルの先頭に、依存関係として azure-cognitiveservices-speech.js モジュールを追加します。
```
const { textToSpeech } = require('./azure-cognitiveservices-speech');
```

このチュートリアルの前のセクションで作成した textToSpeech メソッドを呼び出す新しい API ルートを追加します。 /api/hello ルート後にこのコードを追加します。

// creates a temp file on server, the streams to client
/* eslint-disable no-unused-vars */
app.get('/text-to-speech', async (req, res, next) => {
    
    const { key, region, phrase, file } = req.query;
    
    if (!key || !region || !phrase) res.status(404).send('Invalid query string');
    
    let fileName = null;
    
    // stream from file or memory
    if (file && file === true) {
        fileName = `./temp/stream-from-file-${timeStamp()}.mp3`;
    }
    
    const audioStream = await textToSpeech(key, region, phrase, fileName);
    res.set({
        'Content-Type': 'audio/mpeg',
        'Transfer-Encoding': 'chunked'
    });
    audioStream.pipe(res);
});

このメソッドは、querystring から textToSpeech メソッドに必須および省略可能なパラメーターを受け取ります。ファイルを作成する必要がある場合は、一意のファイル名が作成されます。 textToSpeech メソッドは非同期に呼び出され、その結果を応答 (res) オブジェクトにパイプ処理します。

クライアントの Web ページをフォームで更新する

必要なパラメーターを収集するフォームを使用して、クライアントの HTML Web ページを更新します。省略可能なパラメーターは、ユーザーがどのオーディオコントロールを選択するかに基づいて渡されます。このチュートリアルでは、クライアントから Azure Speech サービスを呼び出すためのメカニズムについて説明しているため、その JavaScript も示します。

/public/client.html ファイルを開いて、その内容を次のコードに置き換えます。

<!DOCTYPE html>
<html lang="en">

<head>
  <title>Microsoft Cognitive Services Demo</title>
  <meta charset="utf-8" />
</head>

<body>

  <div id="content" style="display:none">
    <h1 style="font-weight:500;">Microsoft Cognitive Services Speech </h1>
    <h2>npm: microsoft-cognitiveservices-speech-sdk</h2>
    <table width="100%">
      <tr>
        <td></td>
        <td>
          <a href="https://docs.microsoft.com/azure/cognitive-services/speech-service/get-started" target="_blank">Azure
            Cognitive Services Speech Documentation</a>
        </td>
      </tr>
      <tr>
        <td align="right">Your Speech Resource Key</td>
        <td>

          <input id="resourceKey" type="text" size="40" placeholder="Your resource key (32 characters)" value=""
            onblur="updateSrc()">

      </tr>
      <tr>
        <td align="right">Your Speech Resource region</td>
        <td>
          <input id="resourceRegion" type="text" size="40" placeholder="Your resource region" value="eastus"
            onblur="updateSrc()">

        </td>
      </tr>
      <tr>
        <td align="right" valign="top">Input Text (max 255 char)</td>
        <td><textarea id="phraseDiv" style="display: inline-block;width:500px;height:50px" maxlength="255"
            onblur="updateSrc()">all good men must come to the aid</textarea></td>
      </tr>
      <tr>
        <td align="right">
          Stream directly from Azure Cognitive Services
        </td>
        <td>
          <div>
            <button id="clientAudioAzure" onclick="getSpeechFromAzure()">Get directly from Azure</button>
          </div>
        </td>
      </tr>

      <tr>
        <td align="right">
          Stream audio from file on server</td>
        <td>
          <audio id="serverAudioFile" controls preload="none" onerror="DisplayError()">
          </audio>
        </td>
      </tr>

      <tr>
        <td align="right">Stream audio from buffer on server</td>
        <td>
          <audio id="serverAudioStream" controls preload="none" onerror="DisplayError()">
          </audio>
        </td>
      </tr>
    </table>
  </div>

  <!-- Speech SDK reference sdk. -->
  <script
    src="https://cdn.jsdelivr.net/npm/microsoft-cognitiveservices-speech-sdk@latest/distrib/browser/microsoft.cognitiveservices.speech.sdk.bundle-min.js">
    </script>

  <!-- Speech SDK USAGE -->
  <script>
    // status fields and start button in UI
    var phraseDiv;
    var resultDiv;

    // subscription key and region for speech services.
    var resourceKey = null;
    var resourceRegion = "eastus";
    var authorizationToken;
    var SpeechSDK;
    var synthesizer;

    var phrase = "all good men must come to the aid"
    var queryString = null;

    var audioType = "audio/mpeg";
    var serverSrc = "/text-to-speech";

    document.getElementById('serverAudioStream').disabled = true;
    document.getElementById('serverAudioFile').disabled = true;
    document.getElementById('clientAudioAzure').disabled = true;

    // update src URL query string for Express.js server
    function updateSrc() {

      // input values
      resourceKey = document.getElementById('resourceKey').value.trim();
      resourceRegion = document.getElementById('resourceRegion').value.trim();
      phrase = document.getElementById('phraseDiv').value.trim();

      // server control - by file
      var serverAudioFileControl = document.getElementById('serverAudioFile');
      queryString += `%file=true`;
      const fileQueryString = `file=true&region=${resourceRegion}&key=${resourceKey}&phrase=${phrase}`;
      serverAudioFileControl.src = `${serverSrc}?${fileQueryString}`;
      console.log(serverAudioFileControl.src)
      serverAudioFileControl.type = "audio/mpeg";
      serverAudioFileControl.disabled = false;

      // server control - by stream
      var serverAudioStreamControl = document.getElementById('serverAudioStream');
      const streamQueryString = `region=${resourceRegion}&key=${resourceKey}&phrase=${phrase}`;
      serverAudioStreamControl.src = `${serverSrc}?${streamQueryString}`;
      console.log(serverAudioStreamControl.src)
      serverAudioStreamControl.type = "audio/mpeg";
      serverAudioStreamControl.disabled = false;

      // client control
      var clientAudioAzureControl = document.getElementById('clientAudioAzure');
      clientAudioAzureControl.disabled = false;

    }

    function DisplayError(error) {
      window.alert(JSON.stringify(error));
    }

    // Client-side request directly to Azure Cognitive Services
    function getSpeechFromAzure() {

      // authorization for Speech service
      var speechConfig = SpeechSDK.SpeechConfig.fromSubscription(resourceKey, resourceRegion);

      // new Speech object
      synthesizer = new SpeechSDK.SpeechSynthesizer(speechConfig);

      synthesizer.speakTextAsync(
        phrase,
        function (result) {

          // Success function

          // display status
          if (result.reason === SpeechSDK.ResultReason.SynthesizingAudioCompleted) {

            // load client-side audio control from Azure response
            audioElement = document.getElementById("clientAudioAzure");
            const blob = new Blob([result.audioData], { type: "audio/mpeg" });
            const url = window.URL.createObjectURL(blob);

          } else if (result.reason === SpeechSDK.ResultReason.Canceled) {
            // display Error
            throw (result.errorDetails);
          }

          // clean up
          synthesizer.close();
          synthesizer = undefined;
        },
        function (err) {

          // Error function
          throw (err);
          audioElement = document.getElementById("audioControl");
          audioElement.disabled = true;

          // clean up
          synthesizer.close();
          synthesizer = undefined;
        });

    }

    // Initialization
    document.addEventListener("DOMContentLoaded", function () {

      var clientAudioAzureControl = document.getElementById("clientAudioAzure");
      var resultDiv = document.getElementById("resultDiv");

      resourceKey = document.getElementById('resourceKey').value;
      resourceRegion = document.getElementById('resourceRegion').value;
      phrase = document.getElementById('phraseDiv').value;
      if (!!window.SpeechSDK) {
        SpeechSDK = window.SpeechSDK;
        clientAudioAzure.disabled = false;

        document.getElementById('content').style.display = 'block';
      }
    });

  </script>
</body>

</html>

ファイル内の強調表示された行:

Line 74: cdn.jsdelivr.net サイトを使用して、Azure Speech SDK がクライアントライブラリをプルし、NPM パッケージを配信します。
Line 102: updateSrc メソッドは、キー、リージョン、テキストを含む querystring を使用して、音声制御の src URL を更新します。
行 137:ユーザーが Get directly from Azure ボタンを選択すると、Web ページはクライアントページから Azure に直接呼び出しを実行して、結果を処理します。

"Azure AI 音声" リソースを作成する

Azure Cloud Shell で、Azure CLI コマンドを使用して Speech リソースを作成します。

Azure Cloud Shell にログインします。そのためには、有効な Azure サブスクリプションに対するアクセス許可を持つアカウントを使用して、ブラウザーで認証を行う必要があります。

Speech リソース用のリソースグループを作成します。

az group create \
    --location eastus \
    --name tutorial-resource-group-eastus

そのリソースグループ内に Speech リソースを作成します。
```
az cognitiveservices account create \
    --kind SpeechServices \
    --location eastus \
    --name tutorial-speech \
    --resource-group tutorial-resource-group-eastus \
    --sku F0
```
無料の Speech リソースのみが既に作成されている場合、このコマンドは失敗します。

コマンドを使用して、新しい Speech リソースのキー値を取得します。

az cognitiveservices account keys list \
    --name tutorial-speech \
    --resource-group tutorial-resource-group-eastus \
    --output table

キーのいずれかをコピーします。

キーを使用するには、Azure Speech サービスに対して認証するために、Express アプリの Web フォームにキーを貼り付けます。

Express.js アプリを実行してテキストを音声に変換する

次の Bash コマンドを使用してアプリを起動します。
```
npm start
```
ブラウザーで Web アプリを開きます。
```
http://localhost:3000    
```
強調表示されているテキストボックスに Speech キーを貼り付けます。
必要に応じて、テキストを新しいものに変更します。
次の 3 つのボタンのいずれかを選択して、オーディオ形式への変換を開始します。
- [Get directly from Azure]\(Azure から直接取得する\) - Azure へのクライアント側呼び出し
- [Audio control for audio from file]\(ファイルからのオーディオ用オーディオコントロール\)
- [Audio control for audio from buffer]\(バッファーからのオーディオ用オーディオコントロール\)
コントロールを選択してからオーディオが再生されるまで、少し時間がかかる場合があります。

Visual Studio Code で新しい Azure アプリサービスを作成する

コマンドパレット (Ctrl+Shift+P) で、「create web (Web の作成)」と入力して、[Azure App Service: Create New Web App]\(Azure App Service: 新しい Web アプリの作成\)、[Advanced]\(詳細\) を選択します。 Linux の既定値を使用する代わりに、詳細設定コマンドを使って、リソースグループ、App Service プラン、オペレーティングシステムなどのデプロイを完全に制御できます。
プロンプトに次のように応答します。
- サブスクリプション アカウントを選択します。
- [Enter a globally unique name]\(グローバルに一意の名前を入力する\) には、my-text-to-speech-app など。
  - Azure 全体で一意の名前を入力します。英数字 ('A-Z'、'a-z'、および '0-9') とハイフン ('-') のみを使用します。
- リソースグループの tutorial-resource-group-eastus を選択します。
- Node と LTS を含むバージョンのランタイムスタックを選択します。
- Linux オペレーティングシステムを選択します。
- [新しい App Service プランの作成] を選択し、my-text-to-speech-app-plan のような名前を指定します。
- F1 Free 価格レベルを選択します。サブスクリプションに無料の Web アプリが既にある場合は、Basic レベルを選択します。
- Application Insights リソースに対して [Skip for now](今はしない) を選択します。
- eastus の場所を選択します。
しばらくすると、作成が完了したことが Visual Studio Code により通知されます。 [X] ボタンを使って、通知を閉じます。

Visual Studio Code でローカルの Express.js アプリをリモートのアプリサービスにデプロイする

Web アプリを配置したら、ローカルコンピューターからコードをデプロイします。 Azure アイコンを選択して Azure App Service エクスプローラーを開き、サブスクリプションノードを展開します。作成した Web アプリの名前を右クリックし、[Web アプリにデプロイ] を選択します。
デプロイのプロンプトが表示される場合は、Express.js アプリのルートフォルダーを選択し、サブスクリプションのアカウントをもう一度選択して、先ほど作成した Web アプリの名前 (my-text-to-speech-app) を選択します。
Linux へのデプロイ時に、npm install を実行するようにメッセージが表示される場合、ターゲットサーバーで npm install を実行するために構成を更新するよう求められたら、[Yes]\(はい\) を選択します。
デプロイが完了したら、プロンプトで [Web サイトの参照] を選択して、新しくデプロイした Web アプリを表示します。
(オプション) コードファイルを変更したら、Azure App Service 拡張機能で [Web アプリをデプロイ] を使用して、Web アプリを更新できます。

Visual Studio Code でリモートサービスログをストリーミングする

実行中のアプリで console.log を呼び出すことによって生成される出力を表示 (テール) します。この出力は、Visual Studio Code の [出力] ウィンドウに表示されます。

Azure App Service エクスプローラーで、新しいアプリノードを右クリックし、[Start Streaming Logs]\(ログのストリーム配信を開始する\) を選択します。
```
 Starting Live Log Stream ---
 
```
ブラウザーで数回、Web ページを最新の情報に更新して追加のログ出力を確認します。

リソースグループを削除してリソースをクリーンアップする

このチュートリアルを完了したら、リソースグループを削除する必要があります。これにはリソースが含まれているため、そうすることにより、さらに使用して課金されることがないようにすることができます。

Azure Cloud Shell で、Azure CLI コマンドを使用してリソースグループを削除します。

az group delete --name tutorial-resource-group-eastus  -y

このコマンドには数分かかる場合があります。

次のステップ

Express.js MongoDB アプリを App Service にデプロイする

次の方法で共有

アプリExpress.js Azure AI Speech を使用してテキストを音声に変換する

アプリケーションのアーキテクチャ

前提条件

Express.js のサンプルリポジトリをダウンロードする

Azure AI Speech SDK for JavaScript をインストールする

Express.js アプリ用の Speech モジュールを作成する

Express.js アプリ用の新しいルートを作成する

クライアントの Web ページをフォームで更新する

"Azure AI 音声" リソースを作成する

Express.js アプリを実行してテキストを音声に変換する

Visual Studio Code で新しい Azure アプリサービスを作成する

Visual Studio Code でローカルの Express.js アプリをリモートのアプリサービスにデプロイする

Visual Studio Code でリモートサービスログをストリーミングする

リソースグループを削除してリソースをクリーンアップする

次のステップ

フィードバック

その他のリソース

次の方法で共有

アプリExpress.js Azure AI Speech を使用してテキストを音声に変換する

アプリケーションのアーキテクチャ

前提条件

Express.js のサンプル リポジトリをダウンロードする

Azure AI Speech SDK for JavaScript をインストールする

Express.js アプリ用の Speech モジュールを作成する

Express.js アプリ用の新しいルートを作成する

クライアントの Web ページをフォームで更新する

"Azure AI 音声" リソースを作成する

Express.js アプリを実行してテキストを音声に変換する

Visual Studio Code で新しい Azure アプリ サービスを作成する

Visual Studio Code でローカルの Express.js アプリをリモートのアプリ サービスにデプロイする

Visual Studio Code でリモート サービス ログをストリーミングする

リソース グループを削除してリソースをクリーンアップする

次のステップ

フィードバック

その他のリソース

Express.js のサンプルリポジトリをダウンロードする

Visual Studio Code で新しい Azure アプリサービスを作成する

Visual Studio Code でローカルの Express.js アプリをリモートのアプリサービスにデプロイする

Visual Studio Code でリモートサービスログをストリーミングする

リソースグループを削除してリソースをクリーンアップする