SpeechRecoContext State Property (SAPI 5.3)

Microsoft Speech API 5.3

Chapter 5



TTS! You're finally going to use text-to-speech. Up until now, the Coffee examples have limited themselves to simply accepting speech. You could talk to your CoffeeS3 minions and expect to get drinks. Now you can add the second of the two major components of SAPI, that of text-to-speech.

With all the excitement from the first two examples, CoffeeS3 slows the pace down for the moment. You will be pleasantly surprised at how easy it is to add this feature. The design stage placed emphasis on making things simple. In contrast, SAPI 4 required 200 lines to make a so-called simple "hello world" speak. SAPI 5 requires as few as two. This remarkable reduction in code was possible due to consolidation of overhead. SAPI marshals the required elements for you so your programs have less material to access directly. Also, SAPI uses intelligent defaults whenever possible. You may set many of the elements using existing defaults. The Speech Recognition tab in Speech properties accesses most of these elements, such as voice and speaking rate. Therefore, at the simplest, it is very simple. Naturally, you may override any of these assumptions or defaults. However, for CoffeeS3, you will start with simple tasks: initialization, implementation and speaking text. Don't worry; additional features will be addressed in the next few chapters.


The initialization routine is almost anticlimactic. It is essentially two lines. This is very similar to setting up a speech recognition (SR) engine as you did in CoffeeS0: declare the interface and create the instance from the class ID.


To actually speak something takes one more line of code.


Any application may initialize TTS this way and this is often the preferred method. However, CoffeeS3 takes a slightly different approach. SAPI realizes that because SR and TTS are commonly used together, the initialization routine takes a short cut. Basically, the SR engine provides this capability for you. The following is in CoffeeS3's InitSAPI():


Although it takes the same number of lines of code, TTS is available through the IspRecoContext interface. In fact, this makes the same call to CoCreateInstance(CLSID_SpVoice). The difference is that this method automatically provides the ability to interrupt TTS whenever you start speaking again. This is appropriately known as "barge in." Without this capability, the TTS voice would continue to speak in the background even if you were talking. This might cause audio feedback. Of course, you can still write your own TTS interrupt routine, but the barge in service is provided as a convenience.

Defaults are usually found in Speech properties. That is, when SAPI is properly installed, Speech properties will have defaults for all parameters and will use those defaults. These defaults include the voice, speaking rate, and the language used. This is how TTS can get away with using only two lines of code. However, it is possible that the defaults may not be available or valid, and the application must always check the return value.

That's about it for TTS. You have seen how to initiate a voice and how to speak something. The rest of the code implements these instances. For example, CoffeeS3 talks on five occasions and you need ::Speak at those times.