中文文本合成语音问题

Question

中文文本合成语音问题

Anonymous

在VS2022中，我在试着调用Azure TTS ,将中文文本合成语音。当text为纯英文时，可以合成并读出，但当text中包含中文或纯中文时，程序出错，不能合成语音（如下图）。问题在哪里？

QQ图片20230712123337

Anonymous

2023-07-14T08:45:25.1933333+00:00
关于July 12提交的“中文文本合成语音问题”

感谢你的回复，但没有解决问题。见下图。

因未知原因，我不能打开“https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/1603”

我使用的代码，来自于你们发布的sample code(hellow word.cpp)，请根据我的情况，修改源代码，在VS2022中调试成功后，将源码发我。（成功的标志：输入中文字符串后，可以合成读出）
Minxin Yu 13,506 Reputation points Microsoft External Staff

2023-07-18T01:59:37.4466667+00:00

Since this is an English forum, it is recommended that you post your questions in English.

Minxin Yu 13,506 Microsoft External Staff

Did the snippet work for you?(set your key and region in auto config = SpeechConfig::FromSubscription("yourkey", "eastus"); )

//
// Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE.md file in the project root for full license information.
//

#include "stdafx.h"
// <code>
#include <iostream>
#include <speechapi_cxx.h>
#include<Windows.h>
#include <string>

using namespace std;
using namespace Microsoft::CognitiveServices::Speech;

void synthesizeSpeech()
{
   
    // Creates an instance of a speech config with specified subscription key and service region.
    // Replace with your own subscription key and service region (e.g., "westus").
    auto config = SpeechConfig::FromSubscription("yourkey", "region");

    // Set the voice name, refer to https://aka.ms/speech/voices/neural for full list.
    config->SetSpeechSynthesisVoiceName("zh-CN-XiaoxiaoNeural");

    // Creates a speech synthesizer using the default speaker as audio output. The default spoken language is "en-us".
    auto synthesizer = SpeechSynthesizer::FromConfig(config);

    // Receive a text from console input and synthesize it to speaker.
    cout << "Type some text that you want to speak..." << std::endl;
    cout << "> ";
    std::wstring text;
   
    getline(std::wcin, text);
  
    wcout << L"Input Speech synthesized to speaker for text [" << text << "]" << std::endl;
    auto result = synthesizer->SpeakTextAsync(text).get();

    // Checks result.
    if (result->Reason == ResultReason::SynthesizingAudioCompleted)
    {
        wcout << L"Speech synthesized to speaker for text [" << text << "]" << std::endl;
    }
    else if (result->Reason == ResultReason::Canceled)
    {
        auto cancellation = SpeechSynthesisCancellationDetails::FromResult(result);
        cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;

        if (cancellation->Reason == CancellationReason::Error)
        {
            cout << "CANCELED: ErrorCode=" << (int)cancellation->ErrorCode << std::endl;
            cout << "CANCELED: ErrorDetails=[" << cancellation->ErrorDetails << "]" << std::endl;
            cout << "CANCELED: Did you update the subscription info?" << std::endl;
        }
    }

    // This is to give some time for the speaker to finish playing back the audio
    cout << "Press enter to exit..." << std::endl;
    cin.get();
}

int wmain()
{
    try
    {
        synthesizeSpeech();
    }
    catch (exception e)
    {
        cout << e.what();
    }
    return 0;
}
// </code>

Anonymous

Hi, Yu, I put your code in my VS2022 editor, and changed the subscription key and region with mine, it can work with Chinese text, but the output is not correct.

the code I changed is as follow, pls verify the code with Chinese string "I am a student, my name is 王先生".


#include "stdafx.h"
// <code>
#include <iostream>
#include <speechapi_cxx.h>
#include<Windows.h>
#include <string>

using namespace std;
using namespace Microsoft::CognitiveServices::Speech;

void synthesizeSpeech()
{

    // Creates an instance of a speech config with specified subscription key and service region.
    // Replace with your own subscription key and service region (e.g., "westus").
    auto config = SpeechConfig::FromSubscription("yourkey", "eastasia");

    // Set the voice name, refer to https://aka.ms/speech/voices/neural for full list.
    config->SetSpeechSynthesisVoiceName("zh-CN-XiaoxiaoNeural");

    // Creates a speech synthesizer using the default speaker as audio output. The default spoken language is "en-us".
    auto synthesizer = SpeechSynthesizer::FromConfig(config);

    // Receive a text from console input and synthesize it to speaker.
    cout << "Type some text that you want to speak..." << std::endl;
    cout << "> ";
    std::wstring text;

    getline(std::wcin, text);
    
    wcout << L"Input Speech synthesized to speaker for text [" << text << "]" << std::endl;

    
    auto result = synthesizer->SpeakTextAsync(text).get();

    // Checks result.
    if (result->Reason == ResultReason::SynthesizingAudioCompleted)
    {
        wcout << L"Speech synthesized to speaker for text [" << text << "]" << std::endl;
    }
    else if (result->Reason == ResultReason::Canceled)
    {
        auto cancellation = SpeechSynthesisCancellationDetails::FromResult(result);
        cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;

        if (cancellation->Reason == CancellationReason::Error)
        {
            cout << "CANCELED: ErrorCode=" << (int)cancellation->ErrorCode << std::endl;
            cout << "CANCELED: ErrorDetails=[" << cancellation->ErrorDetails << "]" << std::endl;
            cout << "CANCELED: Did you update the subscription info?" << std::endl;
        }
    }

    // This is to give some time for the speaker to finish playing back the audio
    cout << "Press enter to exit..." << std::endl;
    cin.get();
}

int wmain()
{
    try
    {
        synthesizeSpeech();
    }
    catch (exception e)
    {
        cout << e.what();
    }
    return 0;
}

Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Accepted answer

1 additional answer

Your answer

Anonymous

2023-07-14T08:45:25.1933333+00:00

关于July 12提交的“中文文本合成语音问题”

感谢你的回复，但没有解决问题。见下图。

因未知原因，我不能打开“https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/1603”

我使用的代码，来自于你们发布的sample code(hellow word.cpp)，请根据我的情况，修改源代码，在VS2022中调试成功后，将源码发我。（成功的标志：输入中文字符串后，可以合成读出）
Minxin Yu 13,506 Reputation points Microsoft External Staff

2023-07-18T01:59:37.4466667+00:00

Since this is an English forum, it is recommended that you post your questions in English.
Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Answer 1

Hi, @Frank

I modified the snippet to make sure Chinese can be stored in wstring.

You can copy the snippet below and replace with your key. To avoid cmd echo, use system("chcp 936 > nul")

std::wstring text; setlocale(LC_ALL, "chs"); getline(wcin,text);

#include "stdafx.h"
// <code>
#include <iostream>
#include <speechapi_cxx.h>
#include<Windows.h>
#include <string>
#pragma execution_character_set("utf-8")
using namespace std;
using namespace Microsoft::CognitiveServices::Speech;

void synthesizeSpeech()
{
    system("chcp  936");
    // Creates an instance of a speech config with specified subscription key and service region.
    // Replace with your own subscription key and service region (e.g., "westus").
    auto config = SpeechConfig::FromSubscription("key", "eastasia");

    auto language = "zh-CN"; 
    config->SetSpeechSynthesisLanguage(language); 
    // Set the voice name, refer to https://aka.ms/speech/voices/neural for full list.
    config->SetSpeechSynthesisVoiceName("zh-CN-XiaoxiaoNeural");

    // Creates a speech synthesizer using the default speaker as audio output. The default spoken language is "en-us".
    auto synthesizer = SpeechSynthesizer::FromConfig(config);

    // Receive a text from console input and synthesize it to speaker.
    cout << "Type some text that you want to speak..." << std::endl;
    cout << "> ";
    std::wstring text;
    setlocale(LC_ALL, "chs");
   getline(wcin,text);
   
    wcout << L"Input Speech synthesized to speaker for text [" << text << "]" << std::endl;


    auto result = synthesizer->SpeakTextAsync(text).get();

    // Checks result.
    if (result->Reason == ResultReason::SynthesizingAudioCompleted)
    {
        wcout << L"Speech synthesized to speaker for text [" << text << "]" << std::endl;
    }
    else if (result->Reason == ResultReason::Canceled)
    {
        auto cancellation = SpeechSynthesisCancellationDetails::FromResult(result);
        cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;

        if (cancellation->Reason == CancellationReason::Error)
        {
            cout << "CANCELED: ErrorCode=" << (int)cancellation->ErrorCode << std::endl;
            cout << "CANCELED: ErrorDetails=[" << cancellation->ErrorDetails << "]" << std::endl;
            cout << "CANCELED: Did you update the subscription info?" << std::endl;
        }
    }

    // This is to give some time for the speaker to finish playing back the audio
    cout << "Press enter to exit..." << std::endl;
    cin.get();
}

int wmain()
{
    try
    {
        synthesizeSpeech();
    }
    catch (exception e)
    {
        cout << e.what();
    }
    return 0;
}

Best regards,

Minxin Yu

If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

Anonymous

2023-07-20T03:00:30.2266667+00:00

非常感谢，已经能合成中文语音了。

但是，当我加入一行代码，text = "I am a student, 我是晓白" , VS2022提示“无操作符 '='”。那如何将一个中文字符串，赋值给text变量呢？
Minxin Yu 13,506 Reputation points Microsoft External Staff

2023-07-20T03:27:23.2466667+00:00

text = L"I am a student, 我是晓白";

Answer 2

Dillon Silzer 57,831 Volunteer Moderator

Hello Frank,

Please try looking at the following:

CPP Text-To-Speech quickstart doesnot work on Chinese

https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/1603

Here may be a solution:

First off, the root cause of the problem is that the string format coming from the console isn't capturing the non-ascii characters properly. The (quickest) solution for that would be to use the std::wstring overloads for SpeakTextAsync to ensure properly encoded strings are passed in. (Alternately, the std::string overload will work with a UTF-8 encoded string) Later on the Speech SDK internally hits an error due to the string encoding not being what was expected, and doesn't return an overly useful error. I've opened a bug in our internal tracking system to address that.

If this is helpful please accept answer.

Anonymous

2023-07-18T03:18:47.4+00:00
感谢你的回复，但没有解决问题。见下图。

因未知原因，我不能打开“https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/1603”

我使用的代码，来自于你们发布的sample code(hellow word.cpp)，请根据我的情况，修改源代码，在VS2022中调试成功后，将源码发我。（成功的标志：输入中文字符串后，可以合成读出）
Anonymous

2023-07-18T04:14:27.4766667+00:00
The following is the further feedback in English.

Thank you for your reply, but it doesn't solve the problem.After I change "string" to "wstring", the code in the next line changed to red under lined, it indicate further error. See the screenshot below.

For unknown reasons, I can't open https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/1603. The possible reason maybe the China Internet Security Policy.

The code I used is from the sample code published by Windows Azure (see attacheded, hellow world.cpp.txt ), please modify the source code according to my situation, and send me the source code after successful debugging in VS2022. (Sign of success: After entering the Chinese string, it can be synthesized and read out).

Thank you for your consideration.

Share via

中文文本合成语音问题

1 additional answer

Your answer