根據支援的語言清單比較識別音訊中的口說語言時,會使用語言識別。
語言識別 (LID) 使用案例包括:
- 語音轉換文字辨識當您需要識別音訊來源中的語言,然後轉錄為文字時。
- 當您需要識別音訊來源中的語言,然後將它翻譯成其他語言時,語音翻譯。
就語音辨識而言,因為需要語言識別,初始延遲更長。 應該只在需要時才包含此選用功能。
設定組態選項
無論語言識別是搭配語音轉換文字,還是搭配語音翻譯使用,都有一些共通的概念和設定選項。
然後對語音服務提出辨識一次或連續辨識要求。
本文提供程式碼片段來描述概念。 稍後會提供每個使用案例的完整範例。
候選語言
您使用 AutoDetectSourceLanguageConfig
物件來提供候選語言。 您預期音訊中至少有一種候選語言。
開始時 LID 最多可以包含 4 種語言,連續 LID 最多包含 10 種語言。 語音服務會傳回其中一個提供的候選語言,即使這些語言不在音訊中也一樣。 例如,如果提供了 fr-FR
(法語)和 en-US
(英語) 作為候選項目,卻說出德語,服務會傳回 fr-FR
或 en-US
。
您必須提供含破折號 (-
) 的完整地區設定,但語言識別對每一種基礎語言只使用一個地區設定。 請勿包含相同語言的多個地區設定,例如,en-US
和 en-GB
。
var autoDetectSourceLanguageConfig =
AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "de-DE", "zh-CN" });
auto autoDetectSourceLanguageConfig =
AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE", "zh-CN" });
auto_detect_source_language_config = \
speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE", "zh-CN"])
AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig =
AutoDetectSourceLanguageConfig.fromLanguages(Arrays.asList("en-US", "de-DE", "zh-CN"));
var autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fromLanguages([("en-US", "de-DE", "zh-CN"]);
NSArray *languages = @[@"en-US", @"de-DE", @"zh-CN"];
SPXAutoDetectSourceLanguageConfiguration* autoDetectSourceLanguageConfig = \
[[SPXAutoDetectSourceLanguageConfiguration alloc]init:languages];
如需詳細資訊,請參閱 支援的語言。
開始時和連續語言識別
語音支援開始時和連續語言識別 (LID)。
注意
只有 C#、C++、JAVA (只有語音轉換文字)、JavaScript (語音轉換文字),以及 Python 語言的語音 SDK 支援連續語言識別。
- 開始時 LID 會在音訊最初幾秒內識別一次語言。 如果音訊中的語言不改變,請使用啟動時的 LID。 使用開始時 LID 時,可在不到 5 秒的時間內檢測並傳回單一語言。
- 連續 LID 可以在音訊期間識別多種語言。 如果音訊中的語言可能變更,請使用連續 LID。 連續 LID 不支援在相同的句子內變更語言。 例如,如果您主要講西班牙語並插入一些英語單字,則不會偵測每個單字的語言變更。
您呼叫辨識一次或連續的方法以實作開始時 LID 或連續 LID。 只有連續辨識才支援連續 LID。
辨識一次或連續
語言識別還包含辨識物件和作業。 要求語音服務辨識音訊。
注意
請勿將辨識和識別混淆。 使用辨識時,有無語言識別皆可。
呼叫「辨識一次」方法,或啟動和停止連續辨識方法。 您可以選擇:
- 辨識一次搭配開始時 LID。 辨識一次不支援連續 LID。
- 搭配啟動時的 LID 使用連續辨識。
- 搭配連續 LID 使用連續辨識。
連續 LID 僅需要 SpeechServiceConnection_LanguageIdMode
屬性。 如果沒有它,語音服務預設為啟動時的 LID。 啟動時的 LID 支援的值為 AtStart
,連續 LID 則為 Continuous
。
// Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
var result = await recognizer.RecognizeOnceAsync();
// Start and stop continuous recognition with At-start LID
await recognizer.StartContinuousRecognitionAsync();
await recognizer.StopContinuousRecognitionAsync();
// Start and stop continuous recognition with Continuous LID
speechConfig.SetProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");
await recognizer.StartContinuousRecognitionAsync();
await recognizer.StopContinuousRecognitionAsync();
// Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
auto result = recognizer->RecognizeOnceAsync().get();
// Start and stop continuous recognition with At-start LID
recognizer->StartContinuousRecognitionAsync().get();
recognizer->StopContinuousRecognitionAsync().get();
// Start and stop continuous recognition with Continuous LID
speechConfig->SetProperty(PropertyId::SpeechServiceConnection_LanguageIdMode, "Continuous");
recognizer->StartContinuousRecognitionAsync().get();
recognizer->StopContinuousRecognitionAsync().get();
// Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
SpeechRecognitionResult result = recognizer->RecognizeOnceAsync().get();
// Start and stop continuous recognition with At-start LID
recognizer.startContinuousRecognitionAsync().get();
recognizer.stopContinuousRecognitionAsync().get();
// Start and stop continuous recognition with Continuous LID
speechConfig.setProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");
recognizer.startContinuousRecognitionAsync().get();
recognizer.stopContinuousRecognitionAsync().get();
# Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
result = recognizer.recognize_once()
# Start and stop continuous recognition with At-start LID
recognizer.start_continuous_recognition()
recognizer.stop_continuous_recognition()
# Start and stop continuous recognition with Continuous LID
speech_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceConnection_LanguageIdMode, value='Continuous')
recognizer.start_continuous_recognition()
recognizer.stop_continuous_recognition()
使用語音轉換文字
當您需要識別音訊來源中的語言,然後轉錄為文字時,請使用語音轉換文字辨識。 如需詳細資訊,請參閱語音轉換文字概觀。
注意
C#、C++、Python、Java、JavaScript 和 Objective-C 的語音 SDK 支援語音轉換文字辨識搭配開始時語言識別。 只有 C#、C++、Java、JavaScript 和 Python 的語音 SDK 才支援語音轉換文字辨識搭配連續語言識別。
目前,若要使用連續語言識別的語音轉換文字辨識,您必須從端點建立SpeechConfig,如程式碼範例所示。
請參閱 GitHub 上更多的語音轉換文字辨識搭配語言識別範例。
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
var speechConfig = SpeechConfig.FromEndpoint(new Uri("YourSpeechEndpoint"), "YourSpeechKey");
var autoDetectSourceLanguageConfig =
AutoDetectSourceLanguageConfig.FromLanguages(
new string[] { "en-US", "de-DE", "zh-CN" });
using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
using (var recognizer = new SpeechRecognizer(
speechConfig,
autoDetectSourceLanguageConfig,
audioConfig))
{
var speechRecognitionResult = await recognizer.RecognizeOnceAsync();
var autoDetectSourceLanguageResult =
AutoDetectSourceLanguageResult.FromResult(speechRecognitionResult);
var detectedLanguage = autoDetectSourceLanguageResult.Language;
}
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
var config = SpeechConfig.FromEndpoint(new Uri("YourSpeechEndpoint"), "YourSpeechKey");
// Set the LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart)
config.SetProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");
var autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "de-DE", "zh-CN" });
var stopRecognition = new TaskCompletionSource<int>();
using (var audioInput = AudioConfig.FromWavFileInput(@"en-us_zh-cn.wav"))
{
using (var recognizer = new SpeechRecognizer(config, autoDetectSourceLanguageConfig, audioInput))
{
// Subscribes to events.
recognizer.Recognizing += (s, e) =>
{
if (e.Result.Reason == ResultReason.RecognizingSpeech)
{
Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}");
var autoDetectSourceLanguageResult = AutoDetectSourceLanguageResult.FromResult(e.Result);
Console.WriteLine($"DETECTED: Language={autoDetectSourceLanguageResult.Language}");
}
};
recognizer.Recognized += (s, e) =>
{
if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
var autoDetectSourceLanguageResult = AutoDetectSourceLanguageResult.FromResult(e.Result);
Console.WriteLine($"DETECTED: Language={autoDetectSourceLanguageResult.Language}");
}
else if (e.Result.Reason == ResultReason.NoMatch)
{
Console.WriteLine($"NOMATCH: Speech could not be recognized.");
}
};
recognizer.Canceled += (s, e) =>
{
Console.WriteLine($"CANCELED: Reason={e.Reason}");
if (e.Reason == CancellationReason.Error)
{
Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
Console.WriteLine($"CANCELED: Did you set the speech resource key and endpoint values?");
}
stopRecognition.TrySetResult(0);
};
recognizer.SessionStarted += (s, e) =>
{
Console.WriteLine("\n Session started event.");
};
recognizer.SessionStopped += (s, e) =>
{
Console.WriteLine("\n Session stopped event.");
Console.WriteLine("\nStop recognition.");
stopRecognition.TrySetResult(0);
};
// Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
// Waits for completion.
// Use Task.WaitAny to keep the task rooted.
Task.WaitAny(new[] { stopRecognition.Task });
// Stops recognition.
await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
}
}
請參閱 GitHub 上更多的語音轉換文字辨識搭配語言識別範例。
using namespace std;
using namespace Microsoft::CognitiveServices::Speech;
using namespace Microsoft::CognitiveServices::Speech::Audio;
auto speechConfig = SpeechConfig::FromEndpoint("YourServiceEndpoint", "YourSpeechResoureKey");
auto autoDetectSourceLanguageConfig =
AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE", "zh-CN" });
auto recognizer = SpeechRecognizer::FromConfig(
speechConfig,
autoDetectSourceLanguageConfig
);
speechRecognitionResult = recognizer->RecognizeOnceAsync().get();
auto autoDetectSourceLanguageResult =
AutoDetectSourceLanguageResult::FromResult(speechRecognitionResult);
auto detectedLanguage = autoDetectSourceLanguageResult->Language;
// Creates an instance of a speech config with specified subscription key and service region.
// Note: For multi-lingual speech recognition with language id, it only works with speech v2 endpoint,
// you must use FromEndpoint api in order to use the speech v2 endpoint.
// Replace YourServiceRegion with your region, for example "westus", and
// replace YourSubscriptionKey with your own speech key.
string speechv2Endpoint = "wss://YourServiceRegion.stt.speech.microsoft.com/speech/universal/v2";
auto speechConfig = SpeechConfig::FromEndpoint(speechv2Endpoint, "YourSubscriptionKey");
// Set the mode of input language detection to either "AtStart" (the default) or "Continuous".
// Please refer to the documentation of Language ID for more information.
// https://aka.ms/speech/lid?pivots=programming-language-cpp
speechConfig->SetProperty(PropertyId::SpeechServiceConnection_LanguageIdMode, "Continuous");
// Define the set of languages to detect
auto autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "zh-CN" });
// Creates a speech recognizer using file as audio input.
// Replace with your own audio file name.
auto audioInput = AudioConfig::FromWavFileInput("en-us_zh-cn.wav");
auto recognizer = SpeechRecognizer::FromConfig(speechConfig, autoDetectSourceLanguageConfig, audioInput);
// promise for synchronization of recognition end.
promise<void> recognitionEnd;
// Subscribes to events.
recognizer->Recognizing.Connect([](const SpeechRecognitionEventArgs& e)
{
auto lidResult = AutoDetectSourceLanguageResult::FromResult(e.Result);
cout << "Recognizing in " << lidResult->Language << ": Text =" << e.Result->Text << std::endl;
});
recognizer->Recognized.Connect([](const SpeechRecognitionEventArgs& e)
{
if (e.Result->Reason == ResultReason::RecognizedSpeech)
{
auto lidResult = AutoDetectSourceLanguageResult::FromResult(e.Result);
cout << "RECOGNIZED in " << lidResult->Language << ": Text=" << e.Result->Text << "\n"
<< " Offset=" << e.Result->Offset() << "\n"
<< " Duration=" << e.Result->Duration() << std::endl;
}
else if (e.Result->Reason == ResultReason::NoMatch)
{
cout << "NOMATCH: Speech could not be recognized." << std::endl;
}
});
recognizer->Canceled.Connect([&recognitionEnd](const SpeechRecognitionCanceledEventArgs& e)
{
cout << "CANCELED: Reason=" << (int)e.Reason << std::endl;
if (e.Reason == CancellationReason::Error)
{
cout << "CANCELED: ErrorCode=" << (int)e.ErrorCode << "\n"
<< "CANCELED: ErrorDetails=" << e.ErrorDetails << "\n"
<< "CANCELED: Did you update the subscription info?" << std::endl;
recognitionEnd.set_value(); // Notify to stop recognition.
}
});
recognizer->SessionStopped.Connect([&recognitionEnd](const SessionEventArgs& e)
{
cout << "Session stopped.";
recognitionEnd.set_value(); // Notify to stop recognition.
});
// Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
recognizer->StartContinuousRecognitionAsync().get();
// Waits for recognition end.
recognitionEnd.get_future().get();
// Stops recognition.
recognizer->StopContinuousRecognitionAsync().get();
請參閱 GitHub 上更多的語音轉換文字辨識搭配語言識別範例。
AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig =
AutoDetectSourceLanguageConfig.fromLanguages(Arrays.asList("en-US", "de-DE"));
SpeechRecognizer recognizer = new SpeechRecognizer(
speechConfig,
autoDetectSourceLanguageConfig,
audioConfig);
Future<SpeechRecognitionResult> future = recognizer.recognizeOnceAsync();
SpeechRecognitionResult result = future.get(30, TimeUnit.SECONDS);
AutoDetectSourceLanguageResult autoDetectSourceLanguageResult =
AutoDetectSourceLanguageResult.fromResult(result);
String detectedLanguage = autoDetectSourceLanguageResult.getLanguage();
recognizer.close();
speechConfig.close();
autoDetectSourceLanguageConfig.close();
audioConfig.close();
result.close();
// Shows how to do continuous speech recognition on a multilingual audio file with continuous language detection. Here, we assume the
// spoken language in the file can alternate between English (US), Spanish (Mexico) and German.
// If specified, speech recognition will use the custom model associated with the detected language.
public static void continuousRecognitionFromFileWithContinuousLanguageDetectionWithCustomModels() throws InterruptedException, ExecutionException, IOException, URISyntaxException
{
// Creates an instance of a speech config with specified
// subscription key and endpoint URL. Replace with your own subscription key
// and endpoint URL.
SpeechConfig speechConfig = SpeechConfig.fromEndpoint(new URI("YourEndpointUrl"), "YourSubscriptionKey");
// Change the default from at-start language detection to continuous language detection, since the spoken language in the audio
// may change.
speechConfig.setProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");
// Define a set of expected spoken languages in the audio, with an optional custom model endpoint ID associated with each.
// Update the below with your own languages. Please see https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support
// for all supported languages.
// Update the below with your own custom model endpoint IDs, or omit it if you want to use the standard model.
List<SourceLanguageConfig> sourceLanguageConfigs = new ArrayList<SourceLanguageConfig>();
sourceLanguageConfigs.add(SourceLanguageConfig.fromLanguage("en-US", "YourEnUsCustomModelID"));
sourceLanguageConfigs.add(SourceLanguageConfig.fromLanguage("es-MX", "YourEsMxCustomModelID"));
sourceLanguageConfigs.add(SourceLanguageConfig.fromLanguage("de-DE"));
// Creates an instance of AutoDetectSourceLanguageConfig with the above 3 source language configurations.
AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.fromSourceLanguageConfigs(sourceLanguageConfigs);
// We provide a WAV file with English and Spanish utterances as an example. Replace with your own multilingual audio file name.
AudioConfig audioConfig = AudioConfig.fromWavFileInput( "es-mx_en-us.wav");
// Creates a speech recognizer using file as audio input and the AutoDetectSourceLanguageConfig
SpeechRecognizer speechRecognizer = new SpeechRecognizer(speechConfig, autoDetectSourceLanguageConfig, audioConfig);
// Semaphore used to signal the call to stop continuous recognition (following either a session ended or a cancelled event)
final Semaphore doneSemaphone = new Semaphore(0);
// Subscribes to events.
/* Uncomment this to see intermediate recognition results. Since this is verbose and the WAV file is long, it is commented out by default in this sample.
speechRecognizer.recognizing.addEventListener((s, e) -> {
AutoDetectSourceLanguageResult autoDetectSourceLanguageResult = AutoDetectSourceLanguageResult.fromResult(e.getResult());
String language = autoDetectSourceLanguageResult.getLanguage();
System.out.println(" RECOGNIZING: Text = " + e.getResult().getText());
System.out.println(" RECOGNIZING: Language = " + language);
});
*/
speechRecognizer.recognized.addEventListener((s, e) -> {
AutoDetectSourceLanguageResult autoDetectSourceLanguageResult = AutoDetectSourceLanguageResult.fromResult(e.getResult());
String language = autoDetectSourceLanguageResult.getLanguage();
if (e.getResult().getReason() == ResultReason.RecognizedSpeech) {
System.out.println(" RECOGNIZED: Text = " + e.getResult().getText());
System.out.println(" RECOGNIZED: Language = " + language);
}
else if (e.getResult().getReason() == ResultReason.NoMatch) {
if (language == null || language.isEmpty() || language.toLowerCase().equals("unknown")) {
System.out.println(" NOMATCH: Speech Language could not be detected.");
}
else {
System.out.println(" NOMATCH: Speech could not be recognized.");
}
}
});
speechRecognizer.canceled.addEventListener((s, e) -> {
System.out.println(" CANCELED: Reason = " + e.getReason());
if (e.getReason() == CancellationReason.Error) {
System.out.println(" CANCELED: ErrorCode = " + e.getErrorCode());
System.out.println(" CANCELED: ErrorDetails = " + e.getErrorDetails());
System.out.println(" CANCELED: Did you update the subscription info?");
}
doneSemaphone.release();
});
speechRecognizer.sessionStarted.addEventListener((s, e) -> {
System.out.println("\n Session started event.");
});
speechRecognizer.sessionStopped.addEventListener((s, e) -> {
System.out.println("\n Session stopped event.");
doneSemaphone.release();
});
// Starts continuous recognition and wait for processing to end
System.out.println(" Recognizing from WAV file... please wait");
speechRecognizer.startContinuousRecognitionAsync().get();
doneSemaphone.tryAcquire(30, TimeUnit.SECONDS);
// Stop continuous recognition
speechRecognizer.stopContinuousRecognitionAsync().get();
// These objects must be closed in order to dispose underlying native resources
speechRecognizer.close();
speechConfig.close();
audioConfig.close();
for (SourceLanguageConfig sourceLanguageConfig : sourceLanguageConfigs)
{
sourceLanguageConfig.close();
}
autoDetectSourceLanguageConfig.close();
}
請參閱 GitHub 上更多的語音轉換文字辨識搭配語言識別範例。
auto_detect_source_language_config = \
speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE"])
speech_recognizer = speechsdk.SpeechRecognizer(
speech_config=speech_config,
auto_detect_source_language_config=auto_detect_source_language_config,
audio_config=audio_config)
result = speech_recognizer.recognize_once()
auto_detect_source_language_result = speechsdk.AutoDetectSourceLanguageResult(result)
detected_language = auto_detect_source_language_result.language
import azure.cognitiveservices.speech as speechsdk
import time
import json
speech_key, endpoint_string = "YourSpeechResoureKey","YourServiceEndpoint"
weatherfilename="en-us_zh-cn.wav"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, endpoint=endpoint_string)
audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)
# Set the LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart)
speech_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceConnection_LanguageIdMode, value='Continuous')
auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(
languages=["en-US", "de-DE", "zh-CN"])
speech_recognizer = speechsdk.SpeechRecognizer(
speech_config=speech_config,
auto_detect_source_language_config=auto_detect_source_language_config,
audio_config=audio_config)
done = False
def stop_cb(evt):
"""callback that signals to stop continuous recognition upon receiving an event `evt`"""
print('CLOSING on {}'.format(evt))
nonlocal done
done = True
# Connect callbacks to the events fired by the speech recognizer
speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
# stop continuous recognition on either session stopped or canceled events
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)
# Start continuous speech recognition
speech_recognizer.start_continuous_recognition()
while not done:
time.sleep(.5)
speech_recognizer.stop_continuous_recognition()
NSArray *languages = @[@"en-US", @"de-DE", @"zh-CN"];
SPXAutoDetectSourceLanguageConfiguration* autoDetectSourceLanguageConfig = \
[[SPXAutoDetectSourceLanguageConfiguration alloc]init:languages];
SPXSpeechRecognizer* speechRecognizer = \
[[SPXSpeechRecognizer alloc] initWithSpeechConfiguration:speechConfig
autoDetectSourceLanguageConfiguration:autoDetectSourceLanguageConfig
audioConfiguration:audioConfig];
SPXSpeechRecognitionResult *result = [speechRecognizer recognizeOnce];
SPXAutoDetectSourceLanguageResult *languageDetectionResult = [[SPXAutoDetectSourceLanguageResult alloc] init:result];
NSString *detectedLanguage = [languageDetectionResult language];
var autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fromLanguages(["en-US", "de-DE"]);
var speechRecognizer = SpeechSDK.SpeechRecognizer.FromConfig(speechConfig, autoDetectSourceLanguageConfig, audioConfig);
speechRecognizer.recognizeOnceAsync((result: SpeechSDK.SpeechRecognitionResult) => {
var languageDetectionResult = SpeechSDK.AutoDetectSourceLanguageResult.fromResult(result);
var detectedLanguage = languageDetectionResult.language;
},
{});
語音轉換文字自訂模型
注意
使用自訂模型的語言偵測只能與即時語音轉換文字和語音翻譯搭配使用。 批次謄寫僅支援預設基底模型的語言偵測。
此範例示範如何使用語言偵測搭配自訂端點。 如果偵測到的語言是 en-US
,則此範例會使用預設模型。 如果偵測到的語言 fr-FR
,則此範例會使用自訂模型端點。 如需詳細資訊,請參閱 部署自訂語音模型。
var sourceLanguageConfigs = new SourceLanguageConfig[]
{
SourceLanguageConfig.FromLanguage("en-US"),
SourceLanguageConfig.FromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR")
};
var autoDetectSourceLanguageConfig =
AutoDetectSourceLanguageConfig.FromSourceLanguageConfigs(
sourceLanguageConfigs);
此範例示範如何使用語言偵測搭配自訂端點。 如果偵測到的語言是 en-US
,則此範例會使用預設模型。 如果偵測到的語言 fr-FR
,則此範例會使用自訂模型端點。 如需詳細資訊,請參閱 部署自訂語音模型。
std::vector<std::shared_ptr<SourceLanguageConfig>> sourceLanguageConfigs;
sourceLanguageConfigs.push_back(
SourceLanguageConfig::FromLanguage("en-US"));
sourceLanguageConfigs.push_back(
SourceLanguageConfig::FromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR"));
auto autoDetectSourceLanguageConfig =
AutoDetectSourceLanguageConfig::FromSourceLanguageConfigs(
sourceLanguageConfigs);
此範例示範如何使用語言偵測搭配自訂端點。 如果偵測到的語言是 en-US
,則此範例會使用預設模型。 如果偵測到的語言 fr-FR
,則此範例會使用自訂模型端點。 如需詳細資訊,請參閱 部署自訂語音模型。
List sourceLanguageConfigs = new ArrayList<SourceLanguageConfig>();
sourceLanguageConfigs.add(
SourceLanguageConfig.fromLanguage("en-US"));
sourceLanguageConfigs.add(
SourceLanguageConfig.fromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR"));
AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig =
AutoDetectSourceLanguageConfig.fromSourceLanguageConfigs(
sourceLanguageConfigs);
此範例示範如何使用語言偵測搭配自訂端點。 如果偵測到的語言是 en-US
,則此範例會使用預設模型。 如果偵測到的語言 fr-FR
,則此範例會使用自訂模型端點。 如需詳細資訊,請參閱 部署自訂語音模型。
en_language_config = speechsdk.languageconfig.SourceLanguageConfig("en-US")
fr_language_config = speechsdk.languageconfig.SourceLanguageConfig("fr-FR", "The Endpoint Id for custom model of fr-FR")
auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(
sourceLanguageConfigs=[en_language_config, fr_language_config])
此範例示範如何使用語言偵測搭配自訂端點。 如果偵測到的語言是 en-US
,則此範例會使用預設模型。 如果偵測到的語言 fr-FR
,則此範例會使用自訂模型端點。 如需詳細資訊,請參閱 部署自訂語音模型。
SPXSourceLanguageConfiguration* enLanguageConfig = [[SPXSourceLanguageConfiguration alloc]init:@"en-US"];
SPXSourceLanguageConfiguration* frLanguageConfig = \
[[SPXSourceLanguageConfiguration alloc]initWithLanguage:@"fr-FR"
endpointId:@"The Endpoint Id for custom model of fr-FR"];
NSArray *languageConfigs = @[enLanguageConfig, frLanguageConfig];
SPXAutoDetectSourceLanguageConfiguration* autoDetectSourceLanguageConfig = \
[[SPXAutoDetectSourceLanguageConfiguration alloc]initWithSourceLanguageConfigurations:languageConfigs];
var enLanguageConfig = SpeechSDK.SourceLanguageConfig.fromLanguage("en-US");
var frLanguageConfig = SpeechSDK.SourceLanguageConfig.fromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR");
var autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fromSourceLanguageConfigs([enLanguageConfig, frLanguageConfig]);
執行語音翻譯
當您需要識別音訊來源中的語言,然後將它翻譯成另一種語言時,請使用語音翻譯。 如需詳細資訊,請參閱語音翻譯概觀。
注意
只有 C#、C++、JavaScript 和 Python 語言的語音 SDK 才支援具有語言辨識的語音翻譯。
請參閱 GitHub 上更多的語音翻譯搭配語言識別範例。
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Microsoft.CognitiveServices.Speech.Translation;
public static async Task RecognizeOnceSpeechTranslationAsync()
{
var endpointUrl = new Uri("YourSpeechResoureEndpoint");
var config = SpeechTranslationConfig.FromEndpoint(endpointUrl, "YourSpeechResoureKey");
// Source language is required, but currently ignored.
string fromLanguage = "en-US";
speechTranslationConfig.SpeechRecognitionLanguage = fromLanguage;
speechTranslationConfig.AddTargetLanguage("de");
speechTranslationConfig.AddTargetLanguage("fr");
var autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "de-DE", "zh-CN" });
using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
using (var recognizer = new TranslationRecognizer(
speechTranslationConfig,
autoDetectSourceLanguageConfig,
audioConfig))
{
Console.WriteLine("Say something or read from file...");
var result = await recognizer.RecognizeOnceAsync().ConfigureAwait(false);
if (result.Reason == ResultReason.TranslatedSpeech)
{
var lidResult = result.Properties.GetProperty(PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult);
Console.WriteLine($"RECOGNIZED in '{lidResult}': Text={result.Text}");
foreach (var element in result.Translations)
{
Console.WriteLine($" TRANSLATED into '{element.Key}': {element.Value}");
}
}
}
}
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Microsoft.CognitiveServices.Speech.Translation;
public static async Task MultiLingualTranslation()
{
var endpointUrl = new Uri("YourSpeechResoureEndpoint");
var config = SpeechTranslationConfig.FromEndpoint(endpointUrl, "YourSpeechResoureKey");
// Source language is required, but currently ignored.
string fromLanguage = "en-US";
config.SpeechRecognitionLanguage = fromLanguage;
config.AddTargetLanguage("de");
config.AddTargetLanguage("fr");
// Set the LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart)
config.SetProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");
var autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "de-DE", "zh-CN" });
var stopTranslation = new TaskCompletionSource<int>();
using (var audioInput = AudioConfig.FromWavFileInput(@"en-us_zh-cn.wav"))
{
using (var recognizer = new TranslationRecognizer(config, autoDetectSourceLanguageConfig, audioInput))
{
recognizer.Recognizing += (s, e) =>
{
var lidResult = e.Result.Properties.GetProperty(PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult);
Console.WriteLine($"RECOGNIZING in '{lidResult}': Text={e.Result.Text}");
foreach (var element in e.Result.Translations)
{
Console.WriteLine($" TRANSLATING into '{element.Key}': {element.Value}");
}
};
recognizer.Recognized += (s, e) => {
if (e.Result.Reason == ResultReason.TranslatedSpeech)
{
var lidResult = e.Result.Properties.GetProperty(PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult);
Console.WriteLine($"RECOGNIZED in '{lidResult}': Text={e.Result.Text}");
foreach (var element in e.Result.Translations)
{
Console.WriteLine($" TRANSLATED into '{element.Key}': {element.Value}");
}
}
else if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
Console.WriteLine($" Speech not translated.");
}
else if (e.Result.Reason == ResultReason.NoMatch)
{
Console.WriteLine($"NOMATCH: Speech could not be recognized.");
}
};
recognizer.Canceled += (s, e) =>
{
Console.WriteLine($"CANCELED: Reason={e.Reason}");
if (e.Reason == CancellationReason.Error)
{
Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
Console.WriteLine($"CANCELED: Did you set the speech resource key and endpoint values?");
}
stopTranslation.TrySetResult(0);
};
recognizer.SpeechStartDetected += (s, e) => {
Console.WriteLine("\nSpeech start detected event.");
};
recognizer.SpeechEndDetected += (s, e) => {
Console.WriteLine("\nSpeech end detected event.");
};
recognizer.SessionStarted += (s, e) => {
Console.WriteLine("\nSession started event.");
};
recognizer.SessionStopped += (s, e) => {
Console.WriteLine("\nSession stopped event.");
Console.WriteLine($"\nStop translation.");
stopTranslation.TrySetResult(0);
};
// Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
Console.WriteLine("Start translation...");
await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
Task.WaitAny(new[] { stopTranslation.Task });
await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
}
}
}
請參閱 GitHub 上更多的語音翻譯搭配語言識別範例。
auto endpointString = "YourSpeechResoureEndpoint";
auto config = SpeechTranslationConfig::FromEndpoint(endpointString, "YourSpeechResoureKey");
auto autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE" });
// Sets source and target languages
// The source language will be detected by the language detection feature.
// However, the SpeechRecognitionLanguage still need to set with a locale string, but it will not be used as the source language.
// This will be fixed in a future version of Speech SDK.
auto fromLanguage = "en-US";
config->SetSpeechRecognitionLanguage(fromLanguage);
config->AddTargetLanguage("de");
config->AddTargetLanguage("fr");
// Creates a translation recognizer using microphone as audio input.
auto recognizer = TranslationRecognizer::FromConfig(config, autoDetectSourceLanguageConfig);
cout << "Say something...\n";
// Starts translation, and returns after a single utterance is recognized. The end of a
// single utterance is determined by listening for silence at the end or until a maximum of 15
// seconds of audio is processed. The task returns the recognized text as well as the translation.
// Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
// shot recognition like command or query.
// For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
auto result = recognizer->RecognizeOnceAsync().get();
// Checks result.
if (result->Reason == ResultReason::TranslatedSpeech)
{
cout << "RECOGNIZED: Text=" << result->Text << std::endl;
for (const auto& it : result->Translations)
{
cout << "TRANSLATED into '" << it.first.c_str() << "': " << it.second.c_str() << std::endl;
}
}
else if (result->Reason == ResultReason::RecognizedSpeech)
{
cout << "RECOGNIZED: Text=" << result->Text << " (text could not be translated)" << std::endl;
}
else if (result->Reason == ResultReason::NoMatch)
{
cout << "NOMATCH: Speech could not be recognized." << std::endl;
}
else if (result->Reason == ResultReason::Canceled)
{
auto cancellation = CancellationDetails::FromResult(result);
cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;
if (cancellation->Reason == CancellationReason::Error)
{
cout << "CANCELED: ErrorCode=" << (int)cancellation->ErrorCode << std::endl;
cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
cout << "CANCELED: Did you set the speech resource key and endpoint values?" << std::endl;
}
}
using namespace std;
using namespace Microsoft::CognitiveServices::Speech;
using namespace Microsoft::CognitiveServices::Speech::Audio;
using namespace Microsoft::CognitiveServices::Speech::Translation;
void MultiLingualTranslation()
{
auto config = SpeechTranslationConfig::FromEndpoint("YourSpeechResoureEndpoint", "YourSpeechResoureKey");
// Set the LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart)
speechConfig->SetProperty(PropertyId::SpeechServiceConnection_LanguageIdMode, "Continuous");
auto autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE", "zh-CN" });
promise<void> recognitionEnd;
// Source language is required, but currently ignored.
auto fromLanguage = "en-US";
config->SetSpeechRecognitionLanguage(fromLanguage);
config->AddTargetLanguage("de");
config->AddTargetLanguage("fr");
auto audioInput = AudioConfig::FromWavFileInput("whatstheweatherlike.wav");
auto recognizer = TranslationRecognizer::FromConfig(config, autoDetectSourceLanguageConfig, audioInput);
recognizer->Recognizing.Connect([](const TranslationRecognitionEventArgs& e)
{
std::string lidResult = e.Result->Properties.GetProperty(PropertyId::SpeechServiceConnection_AutoDetectSourceLanguageResult);
cout << "Recognizing in Language = "<< lidResult << ":" << e.Result->Text << std::endl;
for (const auto& it : e.Result->Translations)
{
cout << " Translated into '" << it.first.c_str() << "': " << it.second.c_str() << std::endl;
}
});
recognizer->Recognized.Connect([](const TranslationRecognitionEventArgs& e)
{
if (e.Result->Reason == ResultReason::TranslatedSpeech)
{
std::string lidResult = e.Result->Properties.GetProperty(PropertyId::SpeechServiceConnection_AutoDetectSourceLanguageResult);
cout << "RECOGNIZED in Language = " << lidResult << ": Text=" << e.Result->Text << std::endl;
}
else if (e.Result->Reason == ResultReason::RecognizedSpeech)
{
cout << "RECOGNIZED: Text=" << e.Result->Text << " (text could not be translated)" << std::endl;
}
else if (e.Result->Reason == ResultReason::NoMatch)
{
cout << "NOMATCH: Speech could not be recognized." << std::endl;
}
for (const auto& it : e.Result->Translations)
{
cout << " Translated into '" << it.first.c_str() << "': " << it.second.c_str() << std::endl;
}
});
recognizer->Canceled.Connect([&recognitionEnd](const TranslationRecognitionCanceledEventArgs& e)
{
cout << "CANCELED: Reason=" << (int)e.Reason << std::endl;
if (e.Reason == CancellationReason::Error)
{
cout << "CANCELED: ErrorCode=" << (int)e.ErrorCode << std::endl;
cout << "CANCELED: ErrorDetails=" << e.ErrorDetails << std::endl;
cout << "CANCELED: Did you set the speech resource key and endpoint values?" << std::endl;
recognitionEnd.set_value();
}
});
recognizer->Synthesizing.Connect([](const TranslationSynthesisEventArgs& e)
{
auto size = e.Result->Audio.size();
cout << "Translation synthesis result: size of audio data: " << size
<< (size == 0 ? "(END)" : "");
});
recognizer->SessionStopped.Connect([&recognitionEnd](const SessionEventArgs& e)
{
cout << "Session stopped.";
recognitionEnd.set_value();
});
// Starts continuos recognition. Use StopContinuousRecognitionAsync() to stop recognition.
recognizer->StartContinuousRecognitionAsync().get();
recognitionEnd.get_future().get();
recognizer->StopContinuousRecognitionAsync().get();
}
請參閱 GitHub 上更多的語音翻譯搭配語言識別範例。
import azure.cognitiveservices.speech as speechsdk
import time
import json
speech_key, service_endpoint = "YourSpeechResoureKey","YourServiceEndpoint"
weatherfilename="en-us_zh-cn.wav"
# set up translation parameters: source language and target languages
translation_config = speechsdk.translation.SpeechTranslationConfig(
subscription=speech_key,
endpoint=service_endpoint,
speech_recognition_language='en-US',
target_languages=('de', 'fr'))
audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)
# Specify the AutoDetectSourceLanguageConfig, which defines the number of possible languages
auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE", "zh-CN"])
# Creates a translation recognizer using and audio file as input.
recognizer = speechsdk.translation.TranslationRecognizer(
translation_config=translation_config,
audio_config=audio_config,
auto_detect_source_language_config=auto_detect_source_language_config)
# Starts translation, and returns after a single utterance is recognized. The end of a
# single utterance is determined by listening for silence at the end or until a maximum of 15
# seconds of audio is processed. The task returns the recognition text as result.
# Note: Since recognize_once() returns only a single utterance, it is suitable only for single
# shot recognition like command or query.
# For long-running multi-utterance recognition, use start_continuous_recognition() instead.
result = recognizer.recognize_once()
# Check the result
if result.reason == speechsdk.ResultReason.TranslatedSpeech:
print("""Recognized: {}
German translation: {}
French translation: {}""".format(
result.text, result.translations['de'], result.translations['fr']))
elif result.reason == speechsdk.ResultReason.RecognizedSpeech:
print("Recognized: {}".format(result.text))
detectedSrcLang = result.properties[speechsdk.PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult]
print("Detected Language: {}".format(detectedSrcLang))
elif result.reason == speechsdk.ResultReason.NoMatch:
print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
print("Translation canceled: {}".format(result.cancellation_details.reason))
if result.cancellation_details.reason == speechsdk.CancellationReason.Error:
print("Error details: {}".format(result.cancellation_details.error_details))
import azure.cognitiveservices.speech as speechsdk
import time
import json
speech_key, service_endpoint = "YourSpeechResoureKey","YourServiceEndpoint"
weatherfilename="en-us_zh-cn.wav"
# Currently the v2 endpoint is required. In a future SDK release you won't need to set it.
translation_config = speechsdk.translation.SpeechTranslationConfig(
subscription=speech_key,
endpoint=service_endpoint,
speech_recognition_language='en-US',
target_languages=('de', 'fr'))
audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)
# Set the LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart)
translation_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceConnection_LanguageIdMode, value='Continuous')
# Specify the AutoDetectSourceLanguageConfig, which defines the number of possible languages
auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE", "zh-CN"])
# Creates a translation recognizer using and audio file as input.
recognizer = speechsdk.translation.TranslationRecognizer(
translation_config=translation_config,
audio_config=audio_config,
auto_detect_source_language_config=auto_detect_source_language_config)
def result_callback(event_type, evt):
"""callback to display a translation result"""
print("{}: {}\n\tTranslations: {}\n\tResult Json: {}".format(
event_type, evt, evt.result.translations.items(), evt.result.json))
done = False
def stop_cb(evt):
"""callback that signals to stop continuous recognition upon receiving an event `evt`"""
print('CLOSING on {}'.format(evt))
nonlocal done
done = True
# connect callback functions to the events fired by the recognizer
recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
# event for intermediate results
recognizer.recognizing.connect(lambda evt: result_callback('RECOGNIZING', evt))
# event for final result
recognizer.recognized.connect(lambda evt: result_callback('RECOGNIZED', evt))
# cancellation event
recognizer.canceled.connect(lambda evt: print('CANCELED: {} ({})'.format(evt, evt.reason)))
# stop continuous recognition on either session stopped or canceled events
recognizer.session_stopped.connect(stop_cb)
recognizer.canceled.connect(stop_cb)
def synthesis_callback(evt):
"""
callback for the synthesis event
"""
print('SYNTHESIZING {}\n\treceived {} bytes of audio. Reason: {}'.format(
evt, len(evt.result.audio), evt.result.reason))
if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
print("RECOGNIZED: {}".format(evt.result.properties))
if evt.result.properties.get(speechsdk.PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult) == None:
print("Unable to detect any language")
else:
detectedSrcLang = evt.result.properties[speechsdk.PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult]
jsonResult = evt.result.properties[speechsdk.PropertyId.SpeechServiceResponse_JsonResult]
detailResult = json.loads(jsonResult)
startOffset = detailResult['Offset']
duration = detailResult['Duration']
if duration >= 0:
endOffset = duration + startOffset
else:
endOffset = 0
print("Detected language = " + detectedSrcLang + ", startOffset = " + str(startOffset) + " nanoseconds, endOffset = " + str(endOffset) + " nanoseconds, Duration = " + str(duration) + " nanoseconds.")
global language_detected
language_detected = True
# connect callback to the synthesis event
recognizer.synthesizing.connect(synthesis_callback)
# start translation
recognizer.start_continuous_recognition()
while not done:
time.sleep(.5)
recognizer.stop_continuous_recognition()
執行並使用容器
語音容器會提供 Websocket 型查詢端點 API,其可透過語音 SDK 和語音 CLI 來存取。 根據預設,語音 SDK 和語音 CLI 會使用公用語音服務。 若要使用容器,您必須變更初始化方法。 使用容器主機 URL,而不是金鑰和端點。
當您在容器中執行語言識別碼時,請使用 SourceLanguageRecognizer
物件,而不是 SpeechRecognizer
或 TranslationRecognizer
。
如需容器的詳細資訊,請參閱語言辨識語音容器操作指南。
實作語音轉換文字批次謄寫
若要使用 Batch 轉譯 REST API 來識別語言,請在languageIdentification
轉譯 - 提交要求主體中使用 屬性。
警告
批次謄寫僅支援基底模型的語言識別。 如果在謄寫要求中同時指定語言識別和自訂模型,服務會回復為使用指定候選語言的基底模型。 這可能會導致非預期的辨識結果。
如果您的語音轉換文字案例需要語言識別和自訂模型,請使用即時語音轉換文字,而不是批次謄寫。
下列範例顯示具有四個候選語言的 languageIdentification
屬性使用方式。 如需要求屬性的詳細資訊,請參閱 建立批次謄寫。
{
<...>
"properties": {
<...>
"languageIdentification": {
"candidateLocales": [
"en-US",
"ja-JP",
"zh-CN",
"hi-IN"
]
},
<...>
}
}
相關內容