Share via


Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

Microsoft Speech Platform


ISpVoice::Speak speaks the contents of a text string or file.

<pre IsFakePre="true" xmlns=""> <strong>HRESULT Speak(</strong> <strong>LPCWSTR</strong> *<em>pwcs</em>, <strong>DWORD</strong> <em>dwFlags</em>, <strong>ULONG</strong> *<em>pulStreamNumber</em> <strong>);</strong> </pre>


  • pwcs
    [in, string] Pointer to the null-terminated text string (possibly containing XML markup) to be synthesized. This value can be NULL when dwFlags is set to SPF_PURGEBEFORESPEAK indicating that any remaining data to be synthesized should be discarded. If dwFlags is set to SPF_IS_FILENAME, this value should point to a null-terminated, fully qualified path to a file.
  • dwFlags
    [in] Flags used to control the rendering process for this call. The flag values are contained in the SPEAKFLAGS enumeration.
  • pulStreamNumber
    [out] Pointer to a ULONG which receives the current input stream number associated with this Speak request. Each time a string is spoken, an associated stream number is returned. Events queued back to the application related to this string will contain this number. If NULL, no value is passed back.

Return Values

Value Description
S_OK Function completed successfully.
E_INVALIDARG One or more parameters are invalid.
E_POINTER Invalid pointer.
E_OUTOFMEMORY Exceeded available memory.
SPERR_INVALID_FLAGS Invalid flags specified for this operation.
SPERR_DEVICE_BUSY Timeout occurred on synchronous call.


Normally, pulStreamNumber will just be 1. If, however, several asynchronous Speak (or SpeakStream) calls are received and must be queued, the stream number will be incremented for each call. 

If you call the Speak method with SSML markup parameters that omit the closing tag for an element that requires it, such as <prosody> or <emphasis>, the Speech Platform does not return an error code. For example, the following code snippet is missing the closing </prosody> tag, but returns S_OK.

<pre IsFakePre="true" xmlns=""> // Speak a string directly. if (SUCCEEDED(hr)) { hr = cpVoice-&gt;Speak(L"&lt;prosody volume=\"x-loud\" \&gt;Do it now", SPF_IS_XML, 0); } </pre>