speech Package

Microsoft Speech SDK for Python

Modules

audio

Classes that are concerned with the handling of audio input to the various recognizers, and audio output from the speech synthesizer.

dialog

Classes related to dialog service connector.

enums
intent

Classes related to intent recognition from speech.

interop
languageconfig

Classes that are concerned with the handling of language configurations

properties
speech

Classes related to recognizing text from speech, synthesizing speech from text, and general classes used in the various recognizers.

transcription

Classes related to conversation transcription.

translation

Classes related to translation of speech to other languages.

version

Classes

AudioDataStream

Represents audio data stream used for operating audio data as a stream.

Generates an audio data stream from a speech synthesis result (type SpeechSynthesisResult) or a keyword recognition result (type KeywordRecognitionResult).

AutoDetectSourceLanguageResult

Represents auto detection source language result.

The result can be initialized from a speech recognition result.

CancellationDetails
Connection

Proxy class for managing the connection to the speech service of the specified Recognizer.

By default, a Recognizer autonomously manages connection to service when needed. The Connection class provides additional methods for users to explicitly open or close a connection and to subscribe to connection status changes. The use of Connection is optional. It is intended for scenarios where fine tuning of application behavior based on connection status is needed. Users can optionally call open to manually initiate a service connection before starting recognition on the Recognizer associated with this Connection. After starting a recognition, calling open or close might fail. This will not impact the Recognizer or the ongoing recognition. Connection might drop for various reasons, the Recognizer will always try to reinstitute the connection as required to guarantee ongoing operations. In all these cases connected/disconnected events will indicate the change of the connection status.

Note

Updated in version 1.17.0.

Constructor for internal use.

ConnectionEventArgs

Provides data for the ConnectionEvent.

Note

Added in version 1.2.0

Constructor for internal use.

EventSignal

Clients can connect to the event signal to receive events, or disconnect from the event signal to stop receiving events.

Constructor for internal use.

KeywordRecognitionEventArgs

Class for keyword recognition event arguments.

Constructor for internal use.

KeywordRecognitionModel

Represents a keyword recognition model.

KeywordRecognitionResult

Result of a keyword recognition operation.

Constructor for internal use.

KeywordRecognizer

A keyword recognizer.

NoMatchDetails
PhraseListGrammar

Class that allows runtime addition of phrase hints to aid in speech recognition.

Phrases added to the recognizer are effective at the start of the next recognition, or the next time the speech recognizer must reconnect to the speech service.

Note

Added in version 1.5.0.

Constructor for internal use.

PronunciationAssessmentConfig

Represents pronunciation assessment configuration

Note

Added in version 1.14.0.

The configuration can be initialized in two ways:

  • from parameters: pass reference text, grading system, granularity, enable miscue and scenario id.

  • from json: pass a json string

For the parameters details, see https://docs.microsoft.com/azure/cognitive-services/speech-service/rest-speech-to-text#pronunciation-assessment-parameters

PronunciationAssessmentPhonemeResult

Contains phoneme level pronunciation assessment result

Note

Added in version 1.14.0.

PronunciationAssessmentResult

Represents pronunciation assessment result.

Note

Added in version 1.14.0.

The result can be initialized from a speech recognition result.

PronunciationAssessmentWordResult

Contains word level pronunciation assessment result

Note

Added in version 1.14.0.

PropertyCollection

Class to retrieve or set a property value from a property collection.

RecognitionEventArgs

Provides data for the RecognitionEvent.

Constructor for internal use.

RecognitionResult

Detailed information about the result of a recognition operation.

Constructor for internal use.

Recognizer

Base class for different recognizers

ResultFuture

The result of an asynchronous operation.

private constructor

SessionEventArgs

Base class for session event arguments.

Constructor for internal use.

SourceLanguageRecognizer

A source language recognizer - standalone language recognizer, can be used for single language or continuous language detection.

Note

Added in version 1.18.0.

SpeechConfig

Class that defines configurations for speech / intent recognition and speech synthesis.

The configuration can be initialized in different ways:

  • from subscription: pass a subscription key and a region

  • from endpoint: pass an endpoint. Subscription key or authorization token are optional.

  • from host: pass a host address. Subscription key or authorization token are optional.

  • from authorization token: pass an authorization token and a region

SpeechRecognitionCanceledEventArgs

Class for speech recognition canceled event arguments.

Constructor for internal use.

SpeechRecognitionEventArgs

Class for speech recognition event arguments.

Constructor for internal use.

SpeechRecognitionResult

Base class for speech recognition results.

Constructor for internal use.

SpeechRecognizer

A speech recognizer. If you need to specify source language information, please only specify one of these three parameters, language, source_language_config or auto_detect_source_language_config.

SpeechSynthesisBookmarkEventArgs

Class for speech synthesis bookmark event arguments.

Note

Added in version 1.16.0.

Constructor for internal use.

SpeechSynthesisCancellationDetails

Contains detailed information about why a result was canceled.

SpeechSynthesisEventArgs

Class for speech synthesis event arguments.

Constructor for internal use.

SpeechSynthesisResult

Result of a speech synthesis operation.

Constructor for internal use.

SpeechSynthesisVisemeEventArgs

Class for speech synthesis viseme event arguments.

Note

Added in version 1.16.0.

Constructor for internal use.

SpeechSynthesisWordBoundaryEventArgs

Class for speech synthesis word boundary event arguments.

Note

Updated in version 1.21.0.

Constructor for internal use.

SpeechSynthesizer

A speech synthesizer.

SyllableLevelTimingResult

Contains syllable level timing result

Note

Added in version 1.20.0.

SynthesisVoicesResult

Contains detailed information about the retrieved synthesis voices list.

Note

Added in version 1.16.0.

Constructor for internal use.

VoiceInfo

Contains detailed information about the synthesis voice information.

Note

Updated in version 1.17.0.

Constructor for internal use.

Enums

AudioStreamContainerFormat

Defines supported audio stream container format.

AudioStreamWaveFormat

Represents the format specified inside WAV container.

CancellationErrorCode

Defines error code in case that CancellationReason is Error.

CancellationReason

Defines the possible reasons a recognition result might be canceled.

NoMatchReason

Defines the possible reasons a recognition result might not be recognized.

OutputFormat

Output format.

ProfanityOption

Removes profanity (swearing), or replaces letters of profane words with stars.

PronunciationAssessmentGradingSystem

Defines the point system for pronunciation score calibration; default value is FivePoint.

PronunciationAssessmentGranularity

Defines the pronunciation evaluation granularity; default value is Phoneme.

PropertyId

Defines speech property ids.

ResultReason

Specifies the possible reasons a recognition result might be generated.

ServicePropertyChannel

Defines channels used to pass property settings to service.

SpeechSynthesisOutputFormat

Defines the possible speech synthesis output audio formats.

StreamStatus

Defines the possible status of audio data stream.

SynthesisVoiceGender

Defines the gender of synthesis voices

SynthesisVoiceType

Defines the type of synthesis voices