SALT Programmer's Reference
This content is no longer actively maintained. It is provided as is, for anyone who may still be using these technologies, with no warranties or claims of accuracy with regard to the most recent product version or service release.
SALT voice response application Web pages can be created using two different approaches. The first approach uses Web Forms server controls, also commonly called Web server controls. The second approach uses textual programming much like conventional HTML. This approach uses Speech Application Language Tags (SALT), an extension of HTML, and introduces several new high-level tags.
Web server controls are self-contained elements of a Web page. A Web server control is an element, such as a button or text box, that encapsulates other features such as properties, events, and methods. Changing characteristics of an element no longer requires writing the HTML directly for each change but rather just editing the necessary property of the element. Web server controls are similar to controls in Microsoft Visual Basic and Microsoft Visual Studio 2005. For example, to change the name of a button, developers need only to open the item's property box and type the new name. Web server controls, however, are available only through a Microsoft ASP.NET page. The compromise for this simplicity is greater involvement by the server, because code is generated each time the page is rendered. The advantage is that the server work is transparent to the user.
ASP.NET Speech Controls are a special form of Web server controls. Speech Controls add speech capability to existing Web server controls. The basic behavior of the Web server control is unchanged and unaffected but new properties are added that enable speech. Using Web server controls and Speech Controls offers several advantages. First, the application developer can use graphical interface tools such as the Visual Studio 2005 integrated development environment (IDE). Pages can be designed graphically with the associated HTML that is generated by Visual Studio 2005. Second, and more importantly, the resulting page is an ASP.NET page. The full power of the ASP.NET server is available for the page. This means being able to use a standard control that has been speech-enabled. It also means any browser can be used to access the page. The ASP.NET server generates the correct HTML for the specific browser accessing the page. Users and customers do not need to worry if they have the correct browser version. Developers also benefit because they only need to design a single page rather than different pages for different browser capabilities.
Speech Application Language Tags
The approach that Microsoft takes to speech-enable the Web is built around an emerging standard: Speech Application Language Tags (SALT). The SALT Forum has produced the SALT version 1.0 specification and contributed it to the standards body known as the World Wide Web Consortium (W3C). The speech markup used in Speech Server Developer Tools is the implementation by Microsoft of the SALT 1.0 specification. These tags extend HTML and XHTML with a small number of elements and objects that add speech recognition input, audio and text-to-speech playback, and dual tone multi-frequency (DTMF) input to a Web application.
Note
The SASDK does not implement all parts of the SALT 1.0 specification, and it implements some parts slightly differently. For example, both the SALT 1.0 specification and Speech Server 2004 support multimodal applications, however because Speech Server supports only telephony applications, the SASDK does not implement multimodal support features.
Speech Server allows developers to build SALT voice response applications using ASP.NET server-side controls. Using these controls means that the developer does not need to know the details of SALT to build a simple SALT voice response application. However, most developers find it helpful to understand some background to SALT markup.
The following sections describe an architecture for implementing SALT applications, demonstrate how SALT voice response applications are built, and provide an overview of the tags.
SALT Architecture
There are four possible components in implementing a SALT voice response application:
- A Web server. The Web server generates Web pages containing HTML, SALT, and embedded script. The script controls the dialog flow for voice-only interactions. For example, the script defines the order for playing audio prompts to a caller, assuming there are several prompts on a page.
- A telephony server. The Telephony Application Services component of Speech Server connects to the telephone network. The server incorporates a voice-only SALT interpreter to interpret the HTML, SALT markup, and script. The browser can run in a separate process or thread for each caller. Of course, the voice-only SALT interpreter interprets only a subset of HTML because much HTML refers to GUI and is not relevant to a voice-only SALT interpreter.
- A speech server. The Speech Engine Services component of Speech Server recognizes speech and plays audio prompts and responses back to the user.
- The telephony client device.
What Is SALT?
SALT is an extension to HTML that enables developers to add a spoken dialog interface to SALT voice response applications. Using SALT, voice response applications can be written for telephony clients.
SALT is a set of Extensible Markup Language (XML) elements that apply a speech interface to a document by using HTML. Web application developers can use SALT effectively with HTML, XHTML, cHTML, wireless markup language (WML), or pages derived from any other Standard Generalized Markup Language (SGML). SALT markup also provides DTMF for telephony browsers running voice-only applications.
There are four main top-level elements of the Microsoft SALT markup.
Tag | Description |
---|---|
prompt |
Configures the text-to-speech engine and plays speech output. |
listen |
Configures the speech recognizer, executes recognition, and handles recognition events. |
dtmf |
Configures and controls DTMF in telephony applications. |
smex |
Conducts general-purpose communications between Speech Server components. |
In addition, there are several other elements that are child components of the four top-level elements. These components are the grammar element, the content element, the param element, the record element, and the value element.
Why Use SALT?
Any Web developer wanting to speech-enable an application can use SALT. SALT markup is a great solution for adding speech because it can leverage the scripting and event model inherent in HTML to implement the interactive flow with the user. These are some of the benefits of using SALT markup:
- Reuse of application logic. Because the speech interface is a thin markup layer, which applies a purely presentational logic, the code used for the business logic of the application can be reused across different modalities and devices.
- Rapid development. Developers can use existing Web development tools for the development of SALT applications.
How to Use SALT
The following scenario outlines the use of SALT with a very simple code sample. For a more extensive description of the elements used in this example, see the reference documentation supplied with the SASDK.
For applications without a visual display, the application drives interactions with the user by prompting for required information. The HTML scripting and event model performs this function. Using scripting and the event model, the full programmatic control of client-side (or server-side) code is available to application developers for the management of prompt playing, grammar activation, and processing of recognition results.
The RunAsk() function activates prompts and recognitions until the values of the input fields are filled. In the following example, the system needs two input values and asks for each value until it obtains both. The binding of the recognition results into the relevant input fields is accomplished programmatically by the script functions procOriginCity() and procDestCity(), which are triggered by the onreco events of the relevant listen elements. The following code is an example of how a simple system-initiative dialog (a sequence of specific questions or prompts) guides the user.
<html xmlns:salt="http://www.saltforum.org/2002/SALT">
<head>
</head>
<body onload="RunAsk()">
<salt:prompt id="askOriginCity"> Where from? </salt:prompt>
<salt:prompt id="askDestCity"> Where to? </salt:prompt>
<salt:listen id="recoOriginCity" onreco="procOriginCity()">
<salt:grammar src="city.grxml" />
</salt:listen>
<salt:listen id="recoDestCity" onreco="procDestCity()">
<salt:grammar src="city.grxml" />
</salt:listen>
<script language="JScript">
<!--
function RunAsk() {
if (recoOriginCity.text == "") {
askOriginCity.Start();
recoOriginCity.Start();
} else if (recoDestCity.text == "") {
askDestCity.Start();
recoDestCity.Start();
}
}
function procOriginCity() {
RunAsk();
}
function procDestCity() {
window.close();
}
-->
</script>
</body>
</html>
Other event handlers are available in the listen and prompt elements to manage false recognitions, user silences, and other situations requiring some form of recovery. For telephony dialogs, there is also a messaging interface for managing telephony call control.