Talk to Your Data
Dino Esposito
August 1999
Summary: Provides an introduction to Microsoft English Query (MSEQ) for SQL Server 2000. (16 printed pages).
Contents
Introduction
Beyond SQL
Introducing Microsoft English Query for SQL Server 2000
The Features of MSEQ
The MSEQ's Constituent Elements
Creating an MSEQ Domain
Putting MSEQ to Work
Answering a Precise Question
Working with OLAP
Developing EQ Applications
Summary
Introduction
Unless you've spent the best years of your life formulating SQL queries on such things as how to ask for a cup of coffee, chances are that trying to analyze a sophisticated SQL string leaves you a bit baffled. On the other hand, SQL was not intended to be the language for casual conversation. SQL was designed to make the conversation between a program and a database management system (DBMS) easier and more effective. Unfortunately, as developers, we are just the ones deputed to govern such a dialog. In other words, we're required to understand and be affluent in SQL. In this article, I'll provide you with a preview of Microsoft English Query (MSEQ) for SQL Server 2000, a product that sits on top of some SQL-based OLE DB providers (such as Microsoft® SQL Server™, Microsoft Access, and Oracle) and lets you model queries using plain old English instead of complex SQL commands. MSEQ integrates well with SQL Server (both 6.5 and 7.0) and Microsoft Visual InterDev® (VID).
Beyond SQL
The advent of the Internet significantly increased competition among companies making data access strategies and online analytical processing (OLAP) capabilities two central assets in most of the corporate information systems. Consequently, data warehousing (DW) and decision support systems (DSS) are fundamental tools for any executive manager. But how can you interrogate a huge amount of data in the smartest, most efficient and productive way?
The first step is data aggregation, which leads us into the land of OLAP and data warehousing. Basically, data aggregation means that you extract and duplicate a significant subset of data from the information system. Then you organize it to accommodate the queries that your management people may need to issue. In other words, you create a made-to-measure query environment to provide, in a reasonably short time, the results of sophisticated queries that span multiple tables, even computers and platforms.
Techniques and strategies used to create such aggregations of data (also known as cubes) are not trivial, and have to do with the design of data warehouses. As you can imagine, this is a topic far beyond the scope of this article. The important thing to bear in mind is that today you may have two types of data: ordinary databases held by a relational or non-relational DBMS, and OLAP multidimensional data. You use SQL to access the former and MDX (multidimensional expressions) to manipulate the latter. (For more on MDX, be sure to read Carl Nolan's article, "Introduction to Multidimensional Expressions (MDX)." In both cases, however, there are two highly technical languages with which you must become familiar. Not always a picnic!
Among the clients of a data warehouse or, in more general terms, of an information system, are those who don't necessarily have a strong knowledge of SQL or MDX. Moreover, when a human being considers a problem, their vision of it may only be distantly related to the physical organization of the data involved. If you need to know how many red cars have been sold in Manhattan during the first quarter, you have a precise idea of what information you want. However, even for an expert programmer this can be difficult, and undoubtedly time-consuming, to formulate in terms of an SQL or MDX query. This is just the problem that Microsoft English Query addresses.
Introducing Microsoft English Query for SQL Server 2000
MSEQ is a product that sits in between your applications and your data access modules. It takes commands prepared using natural language (English only, at the moment) and translates them either to SQL or MDX. Figure 1 illustrates the architecture of a typical MSEQ-based application.
Figure 1. A Windows application uses MSEQ to translate English sentences to SQL/MDX before issuing commands to the "physical" SQL or OLAP data providers.
As you can see, MSEQ works like a service and takes care of returning SQL/MDX strings in front of English strings. For example, the MSEQ engine accepts a string such as:
List all customers
and returns a SQL string such as:
select Customers.CustName as "CustName",
Customers.CustCity as "CustCity",
Customers.CustState as "CustState"
from Customers
If you sense an aura of mystique, let me say there's nothing magical in software. The English-to-SQL conversion is successful only once you have defined a logical model of the tables and linked them to the verbs and the nouns allowed in the sentences.
MSEQ is a clever tool, but it is still a software product. You are responsible for making it really smart!
The Features of MSEQ
MSEQ is not a completely new tool. It originally shipped as an add-on for SQL Server 6.5 about eighteen months ago. It's been included with SQL Server 7.0, too. However, the version I'm referring to here is the first beta of Microsoft English Query for SQL Server 2000.
(For more information and late-breaking news about English Query, check out https://www.microsoft.com/sql/productinfo/eqmain.htm.)
MSEQ for SQL Server 2000 comes with a great number of significant improvements that indicate how the product is going to reach its maturity and find a precise role in the Microsoft BackOffice® scenario. For one thing, it works great with SQL Server but you can use it also with Access and Oracle databases, or with any other OLE DB data source that supports SQL. You can't use it with an OLE DB source that accesses data using a custom query language.
MSEQ also works well under Microsoft Windows® 98. This article, in fact, was written using a computer running Windows 98 with the beta of MSEQ installed, together with the desktop edition of SQL Server 7.0.
MSEQ for SQL Server 2000 integrates well with Visual InterDev, as Figure 2 illustrates.
Figure 2. Brand-new English Query projects show themselves from the VID Project Wizard dialog box.
The VID environment is very helpful when it comes to modeling the EQ entities and the relationships among them. Such a model is the key element in enabling the English-to-SQL translation. Within VID, you can take advantage of advanced UI features such as the drag-and-drop operation, context menus, and ToolTips to carry out your tasks quickly and, above all, effectively. Basically, an MSEQ project is a collection of data the MSEQ run time utilizes to attempt the translation. Working on a project means defining the logical model that best fits the questions to answer. Building the project implies a thorough syntax and semantic check of your work. In the VID environment this doesn't appear to be different from building any other Web-based project. In addition, a standard MSEQ test application can further blur the distinction between an MSEQ project and the others. In a nutshell, you can fully leverage any knowledge of the VID features you may have.
The features of the new MSEQ can be summarized in three points. I've already discussed the integration with Visual InterDev. The remaining two features are the capability of using the drag-and-drop operation and wizards for graphical authoring and the OLAP support.
To appreciate the power of graphical authoring, though, you need the fundamentals on the constituent elements of an MSEQ application. In other words, I'm going to provide you with a step-by-step description of the process that renders an existing database in terms of EQ's entities and relationships. Such a model allows you to issue queries using natural, plain old English instead of SQL or MDX.
The MSEQ's Constituent Elements
The MSEQ run time has a built-in knowledge of the structure of both the English and the SQL/MDX language. MSEQ knows how to translate from English to SQL/MDX, but before you provide your own model of data it doesn't know what to translate. The engine uses the model of data you furnish to recognize SQL/MDX patterns in the English sentences it receives. For example, from the sentence:
List all Customers
MSEQ extracts the following: The user wants a list of records; the user doesn't want any filter to be applied; and the user wants information about customers.
The engine knows how to list records but doesn't have enough information to define filters (which is not required in our example). More importantly, it doesn't know anything about the Customer entity. It goes almost without saying that the Customer entity here plays a central role. It must bring information about the database to access and the fields to extract. Roughly said, the verb List (a sort of keyword for MSEQ) suggests a SELECT statement is needed. But how simple such a statement will be, and which fields or joins it will involve, is determined by the specific information the Customer entity has associated. As developers of an MSEQ application, this is just the information we're required to enter in an MSEQ project.
It's a delicate job, and don't underestimate its impact. On the other hand, once you've finished, even the less computer-savvy manager will be able to issue queries by using the keyboard or even the voice. Training the MSEQ engine on the structure of your information system is not much different from training a newly hired programmer, but is ultimately far less expensive! Keep in mind that you have to teach MSEQ how to connect English words (verbs, nouns, adjectives, prepositions, and so on) with database objects (tables, fields, and joins). To accomplish this, you define a number of entities and relationships that together form the domain of your MSEQ application.
Creating an MSEQ Domain
An MSEQ domain is a data structure including information about the layout of the database and the semantic objects (entities and relationships) that you've defined. Within a domain, you can save global settings for your application as well as add new entries to the English Query dictionary. Let's concentrate on entities and relationships.
An entity is a name that renders a real-world object that exists in your database. If your data store includes information about employees and orders, both are good candidates to become entities. An entity has a name, sometimes one or more synonyms, and a link with a table or one or more database fields. For example, the date of an order is an entity, which is reasonably linked to a table's field. On the other hand, it's likely that employees will be an entity that refers to an entire table.
A relationship describes how and why entities are connected to each other in that domain. For example, if you have customers and orders, a reasonable relationship is "customers enter orders" or, more specifically, "customers enter orders on a date." A relationship is given by a sentence that can assume various forms, as shown in Figure 3.
Figure 3. Various phrasings that can define a relationship. The screen shot comes from the MSEQ editor integrated into VID.
In most cases, a relationship is defined on the top of a database join that connects two or more tables using a common field. This is not at all surprising when you realize that establishing relationships and connections between different tables of data is highly valuable in an enterprise scenario.
So, what exactly is an MSEQ domain? It's a collection of related information that comes together to create an environment enabling your users to issue data access commands using natural language. Because the capabilities of MSEQ rely on the quality (and quantity) of the model you built, you can look at an MSEQ domain as a sort of small and simple data aggregation that can answer a number of questions formulated in plain English.
When you attempt to design a data warehouse, you have a certain amount of data to privilege, make accessible more readily, or place in relationship with other data despite their physical organization. In a certain sense, you are exposing a made-to-measure sub-model of data thought to satisfy certain queries. MSEQ does the same thing. You first input the English questions you want the MSEQ engine to answer. From this list of questions you begin identifying the various significant entities and all the important relationships among them. In doing so, the MSEQ Project wizards (one for SQL data and one for OLAP data—see Figure 2) help you by automatically extracting a number of basic entities and relationships from the tables and the fields of the database of choice.
Putting MSEQ to Work
Now let's see how to build an MSEQ domain. The example I provide is simple, but comprehensive enough. It takes an Access database, a question to answer, and goes on to add as many entities and relationships as needed to satisfy the question. I've chosen an Access database mainly to demonstrate that you can employ MSEQ also on small database applications. The documentation that comes with the MSEQ for SQL Server 2000 beta provides a tutorial on how to accomplish this same task using an SQL Server table. (The only significant difference is in the parameters you have to specify to connect to the database!)
The example available through the MSDN Web site is all that's needed to form an MSEQ project. You can open the .eqp file (the project) or you can try to re-create it from scratch by following these steps:
After running Visual InterDev and opening a new project, choose the SQL Project Wizard option and then pick the desired OLE DB provider. Note that although you're presented the full list of OLE DB providers, in practice you can choose only those that support SQL as the query language.
Choose the Jet 4.0 provider and the Access database to work with. In the example, I'm using a database with two tables: Invoices and Customers. The data they represent are the collection of the invoices sent to various clients.
In the Invoices table the customer is identified through a CustomerID field that points to the Customers table. The wizard presents the list of all the tables found: select everything.
**Note **If you're using an Access database at this point you'll probably be warned that no foreign keys are specified. There's time to remedy this later.
Figure 4 illustrates the next step. The wizard is ready to add a standard set of entities and relationships that descend directly from the database structure.
Figure 4. Creation of standard entities and relationships. Clear the check box next to the ones you don't want generated.
Leave things as they are and click OK. At this point, the project is opened within VID, waiting for you to enter more information.
The tables have been imported with a leading dot in the name (for example, ".customers"). For simplicity, rename both the tables and the corresponding entities by right-clicking the respective node in the Semantics and the Tables tabs (see Figure 5).
Figure 5. The MSEQ Project window
If you received a warning about the lack of foreign keys, click the Tables tab and specify the keys for the various tables. You can do this by expanding the Fields node, right-clicking the desired field, and setting it as the key.
At this point (see Figure 5), you have two main entities rendering the elements of the two tables and a number of relationships basically stating that each entity has a certain attribute. Each attribute is a field of the database. Other (minor) entities can be seen by expanding the root nodes. For example, if you double-click Invoices_have_currencies, you'll see that it has "Invoices have currencies" as an associated trait phrasing. It tells the MSEQ engine that there's a relationship between the invoice and the currency entity.
Now try to build the project and enter the List Customers command. You'll get an error message reminding you that the MSEQ hasn't been given any information about customers. That's true! We need to specify what a Customer is in terms of database fields and tables. To remedy, double-click the Customer entity. In the Semantic Properties box make sure there's a Customer entry. (It won't if you renamed entities to cut off the leading dot.) Here you can also add synonyms for Customer. For example, add Client (keeping in mind that names are case sensitive). At this point, the MSEQ engine knows how to handle the list of customers. It translates to:
SELECT xxx FROM Customers
Which fields are selected for the query? By default, the only field selected is the one defined as being the name of the Customers. There's a relationship that states just this. Figure 6 shows that you can change these settings and add as many fields as desired.
Figure 6. Defining the properties of the Customer entity
Notice that you can have entities that map to a table (as in this case), to one or more fields, or to nothing at all.
As in Figure 6, the SQL command for "List Customer" becomes:
select Customers.CustName as "CustName", Customers.CustCity as "CustCity", Customers.CustState as "CustState" from Customers
Also notice that you can now use interchangeably the words Client and Customer in your English queries.
Answering a Precise Question
Using MSEQ means you have to provide a consistent answer to a number of questions. For example, the invoices in the test database can have a different currency. Such a value is reported in a field called Currency. Suppose that one day you want to know which clients pay you in U.K. pounds. The question, therefore, is: "Which clients pay with a given currency?"
This is a specific relationship that you have to add to the domain. Such a relationship will return a list of customers (the three fields just shown) after a join with the invoice database. This means that we have to define the join ourselves in the Tables tab (see Figure 5). In the Invoices table add a new join, putting into relation the CustomerID fields of both tables.
To answer the preceding question, we need to specify an articulated relationship that basically stems out from customer and currency entities and the following two facts:
- Invoices have currency.
- Customers pay invoices with currency.
We already have a standard relationship for the former, now we have to arrange one for the latter. For one thing, why must the relationship be "Customers pay invoices with currency?" Because it is the faithful transposition of the question we need to answer! If you want to be able to formulate the same question in another form, or use other terms, just change the structure of the relationship! Remember that what MSEQ can understand depends strictly on the relationships you define.
To add a new relationship you can right-click the Relationship node or you can leverage the graphical authoring facilities of MSEQ for SQL Server 2000. Drag the Customer entity to the right pane of the window in Figure 5 and then drag the Invoice entity onto it. A window will appear with these two entities selected. Add a third entity, Currency, and then click to add phrasings. From the subsequent menu, choose Verb Phrasing, because you want to add a sentence with a subject, a verb, and more. As shown in Figure 7, complete the sentence by properly selecting the various entities and adding a prepositional phrase.
Figure 7. Defining a new relationship stating that "customers pay invoices with currencies." The graph appears in the Project window.
After saving this new relationship you are finally free to type in the long-awaited query. Figure 8 shows the results.
Figure 8. Listing the clients who pay in U.K. pounds
The first edit box shows how the original sentence is rephrased by MSEQ. The second shows how the answer will be structured, and finally you can see the SQL code to execute against the database connection. Below, find the returned recordset.
Notice that you can make the structure of the recognized queries even more flexible by adding synonyms for both entities and prepositions. For example, at this stage of the project a query string like:
List Clients that pay in UKP
wouldn't be recognized because we just set the with preposition to stay between the verb pay and the currencies. To also support in as a preposition, just enter in the dialog box shown in Figure 7 the string with,in instead of with. Any preposition you want to support (for example, through) can be added after a comma.
Working with OLAP
Perhaps the most powerful and coolest feature of MSEQ for SQL Server 2000 is the built-in support for OLAP data. In addition to SQL, you can also issue English queries against OLAP cubes of data. This feature relies on the services of the SQL Server 7.0 OLAP Services and requires a connected Microsoft Windows NT® Server machine to act as the OLAP server.
OLAP Services include a component to construct multidimensional cubes of data by extracting and summarizing data from corporate data warehouses. A second component lets client applications access such cubes using the MDX language, explicitly defined for querying and manipulating multidimensional cubes.
From the EQ's point of view, OLAP data is just data to be modeled into a domain in terms of entities and relationships. Such a model will then be queried using plain English sentences that end up producing MDX commands instead of SQL.
Of course, OLAP archives are different from, say, an SQL Server table. They introduce concepts such as dimensions, levels, and measures that need an adequate representation in the EQ authoring tools. To work with OLAP data, just choose the OLAP Project Wizard and specify the OLAP server, the database, and the cube.
Figure 9. Connecting to an OLAP database
In addition, enable an OLAP connection in the project's properties dialog box. This will allow the EQ editor to take into account OLAP objects. Let's see how to proceed.
The FoodMart database is the one that ships with OLAP Services, so it's easy for you to reproduce the steps I'm going to illustrate. Select the Sales cube and have the MSEQ editor read the OLAP data schema and submit a number of default entities and relationships—so far, neither more nor less than we saw for SQL data.
Suppose you want to know which customers bought beer. The easiest way of getting this information is by asking the question "List Customers that bought beer." Because there's no "Customer buy product" relationship among those automatically loaded upon project startup, we have to add it. Drop Customer and Product_SubCategory onto the canvas and define a verb phrasing. A more careful examination of the database structure tells us that to find out about beer we need to consider the product subcategory rather than the product. The query now is correctly interpreted and produces the expected results, as shown in Figure 10.
Figure 10. English Queries on an OLAP database
Notice that the language generated is not SQL but MDX—the language that OLAP Services utilizes to query and manipulate multidimensional data. To actually get a recordset you can process it through the ADOMD object model, an ADO-like framework for OLAP clients. (See "Manipulate and Query OLAP Data Using ADOMD and Multidimensional Expressions" for an introductory article.)
To make things a bit more complicated, suppose we want to know about possible relationships between the consumption of beer and the marital status of people. Which customers drink beer and are married? That's the new question to answer. If you try to use the word "married" in any of your queries you get only errors. The status of married means that a customer has a marital status (a defined entity) set to M, M and S being the one-byte strings stored in the database to denote single and married. A query string of "List customers with a marital status of M" works just fine, but perhaps it is not much better than raw SQL! Basically, we need to create a special dictionary entry that MSEQ can detect and silently express in terms of marital status. Right-click the Dictionary Entries node and define a synonym that causes MSEQ to read "married" and translate to "with marital status of M." Given this, what Figure 11 shows is perfectly natural.
Figure 11. Combined use of entities, relationships, and dictionary entries in OLAP
Developing EQ Applications
An MSEQ project comprises a couple of files. The .eqp is the project file to open within Visual InterDev. The .eqd (English Query Domain) file is the compiled MSEQ application. It includes all the information to reach the database and to define the domain. You have to use such a file to embed English Query capabilities into, say, an Active Server Pages (ASP) or a Microsoft Visual Basic®-based application.
There are two steps to follow in order to add English Query to any Microsoft Win32®-based application. First, define the domain of your MSEQ application. This means you have to produce an .eqd file by creating a VID project and compiling it. An .eqd file contains the data model and everything that allows the engine to recognize and process your queries. The semantic is expressed in an XML language called Semantic Modeling Format (SMF). Such XML code can be saved (and analyzed) as a separate file by clicking the Export SMF Files… menu item under the VID's Project menu. Despite the default .smf extension, it's a regular XML file that you can display via Internet Explorer 5.
Once you have an .eqd file (build the project to get it), you can work with it through the MSEQ object model. I'm not examining it very carefully, leaving this for a future centerfold, but I'll show you how to get an SQL string from an English sentence. Code Sample A shows a VBScript demo that pops up the SQL string associated with a text.
To get in touch with MSEQ you need to create a Session object and initialize a domain within it. This is done with the following lines:
Set eqSession = CreateObject("Mseq.Session")
eqSession.InitDomain(strFile)
Notice that the strFile argument must be the name of an .eqd file. To submit an English text you use the ParseRequest method that returns a Response object.
Set eqResponse = eqSession.ParseRequest(strQuery)
The Response object can be of four types:
- An SQL command to execute
- An answer that EQ knows how to get without using SQL statements
- An error
- A request for further information
The specific type is contained in the Type property: values from 0 to 3, respectively. In the case of executable commands, you are returned a collection of SQL commands to scan this way:
for each cmd in eqResponse.Commands
strText = strText & cmd.SQL & vbCrLf
next
The SQL property returns the SQL text, while Restatement is a property that points to the rephrased question.
The MSEQ object model is built into the mseqole.dll file that you can, for instance, easily reference from your Visual Basic-based application. Such a file has no dependencies into other dynamic-link libraries (DLLs), so this DLL and your .eqd domain file is what you have to install in addition to your own application.
Summary
MSEQ is a very promising tool that could really help you build more user-friendly applications both for the Web and the Windows desktop. It exposes an object model, integrates with development tools, it's easy to embed in custom applications and, more importantly, is a powerful tool that shields your users (including other programmers) from intensive SQL and MDX.