Microsoft Jet Database Engine Programmer's Guide - Chapter 4

(blank title page)

Queries

Microsoft Jet database engine has sophisticated query and optimization capabilities that are unmatched by other desktop database engines in its class. These features include updatable views, heterogeneous joins, and the ability to work seamlessly with a wide variety of industry-standard database formats.

The Microsoft Jet query engine is designed to accept user requests for information or action in the form of Structured Query Language (SQL) statements. Microsoft Jet parses, analyzes, and optimizes these queries, and either returns the resulting information in the form of a Recordset object or performs the requested action.

Although Microsoft Jet borrows many query techniques from client/server relational database management systems (DBMSs) such as Microsoft SQL Server, it remains a file-server database. All queries are processed on individual workstations running copies of a host application, such as Microsoft Access, or a custom application created by using a tool, such as Microsoft Visual Basic. Microsoft Jet doesn't act as a true database server, such as SQL Server, that process data requests independently of the application requesting data. However, Microsoft Jet can send queries to SQL Server or other ODBC database servers for processing.

To understand how Microsoft Jet parses, optimizes, and processes queries, it's important to understand the distinction between a client/server DBMS and a file-server system.

Understanding how the Microsoft Jet query engine works will give you a head start in designing your application to take advantage of the unique strengths and features of Microsoft Jet.

Using the Code Examples in This Chapter

You can use the code examples in this chapter to help you understand the concepts discussed, or you can modify them and use them in your own applications.

The code examples are located in the JetBook\Samples subfolder on the companion CD-ROM. The code examples for Microsoft Access 97 are in JetSamples.mdb, and the corresponding code examples for Microsoft Visual Basic version 5.0 and other applications that support Visual Basic for Applications (VBA) are referenced in JetSamples.vbp. Both JetSamples.mdb and JetSamples.vbp use tables and queries in NorthwindTables.mdb, also located in the JetBook\Samples subfolder.

To use the code examples, copy the sample files to your hard disk. Be sure to copy NorthwindTables.mdb as well so that you can use the code examples to work with data.

This chapter includes SQL statements that you can use in Visual Basic or run from SQL view of the Query window in Microsoft Access. To make it easier to use the SQL statements in this chapter in code, all SQL statements are saved as string constants in the modules for Chapter 4. In addition, each SQL statement is saved as a query in the NorthwindTables database.

See Also For more information about copying and using the code examples from the companion CD-ROM, see "Using the Companion CD-ROM" in the Preface.

Note The examples in this chapter show strings within an SQL statement delimited by single quotation marks: 'string'. You can use single quotation marks to delimit a string in Visual Basic or in SQL view of the Query window in Microsoft Access. However, if the string itself contains an apostrophe, an error occurs.

Pairs of double quotation marks (""string"") are optimal delimiters when you're including an SQL statement in Visual Basic code. You can also use Chr(34) to return a string containing a pair of double quotation marks; for example, the expression Chr(34) & Chr(34) & "string" & Chr(34) & Chr(34) concatenates to ""string"".

If you're running an SQL statement from SQL view in Microsoft Access, you may want to use a single set of double quotation marks to delimit a string: "string". If you use a pair of double quotation marks, Microsoft Access returns an error.

Getting Answers to Your Questions

Microsoft Jet uses a dialect of SQL. For more information about the Microsoft Jet implementation of SQL, see Appendix B, "SQL Reference." You form your requests for information by writing SQL queries or by building the query interactively through a tool such as the query design grid in Microsoft Access.

It's not the intention of this chapter to provide a tutorial on SQL. Instead, the focus is on the unique features of the Microsoft Jet implementation of SQL and how Microsoft Jet turns SQL statements into the answers you want.

Communicating with MS Jet

Your SQL queries can be saved as permanent QueryDef objects in the database or created, analyzed, and executed on an ad-hoc basis.

The saved QueryDef object is a particularly important object in a Microsoft Jet database. It's a convenient way to store and reuse commonly asked questions. After you analyze your business problem and create a query that answers that question, you can save the "question" as a permanent query. From then on, you can ask the question again by rerunning the query.

Each time you prepare a request for information, either by sending an SQL statement as an argument to the OpenRecordset method of a Database object or by saving a QueryDef object in your database, Microsoft Jet runs through a complex series of analysis and optimization steps. When you create a QueryDef object, Microsoft Jet performs a parsing phase (reading and interpreting your SQL statements) and an optimization phase (turning that SQL statement into a plan for the most efficient way to retrieve your answer). When you create a permanent QueryDef object, these steps are performed once. You can then execute the saved QueryDef object either to retrieve the answer you need, or to perform the action you requested. For a full discussion of how Microsoft Jet interprets and executes your queries, see "Query Optimization" later in this chapter.

See Also For a full discussion of how QueryDef objects are created and stored in a Microsoft Jet database file, see Chapter 3, "Data Definition and Integrity."

The QueryDef object is also important because it can be used, for the most part, as if it were a table in your database. You can prepare a complex, multiple-table join that brings together data from a variety of sources and then summarizes and analyzes that data. But because the query is saved as a permanent QueryDef object, that complexity can be hidden from your application. You only have to know the name of the QueryDef object in order to use it. Depending on the type of query created, the query can be updatable: Any changes made to the Recordset object built from an updatable QueryDef object are automatically reflected in the underlying tables.

The examples in this chapter present only the text of the SQL statements that illustrate various Microsoft Jet query features under discussion. To execute these queries, you must save them as permanent QueryDef objects, or execute them in VBA code.

See Also For information about how to create Recordset objects based on these SQL statements, see Chapter 5, "Working with Records and Fields."

Selecting Fields from a Single Table

As with other SQL dialects, Microsoft Jet SQL uses the SELECT statement to retrieve one or more fields from a given table. You may be familiar with the SQL statements generated by the Microsoft Access query design grid. Following is a simple example that selects all fields from the Customers table:

SELECT Customers.* FROM Customers;

Here is another example generated by Microsoft Access, which selects particular fields rather than all fields:

SELECT  Products.ProductID,  Products.ProductName,  Products.UnitsInStock,  Products.UnitPrice*Products.UnitsInStock AS InventoryValue FROM Products;

Note that when Microsoft Access displays this query it automatically qualifies each field reference with its base-table name. Microsoft Jet accepts a wide variety of table and field identifiers, including names with spaces and other punctuation. If an identifier contains spaces or any nonstandard characters, it must be enclosed in square brackets.

Microsoft Jet is flexible in its interpretation of SQL text strings. The previous query would work just as well if it were rewritten in a style slightly more familiar to experienced SQL users:

SELECT  P.ProductID,  P.ProductName,  P.UnitsInStock,  P.UnitPrice*P.UnitsInStock AS InventoryValue FROM Products P;

This example uses the alias P to refer to the Products table and qualifies references to fields with that alias name. Although Microsoft Jet allows the syntax used above, queries built with the Microsoft Access query design grid add the AS reserved word in front of the alias name:

SELECT  P.ProductID,  P.UnitsInStock FROM Products AS P;

Restricting, Ordering, and Summarizing

In most cases you want to do more than just list all items in a single table, or even all items in selected fields in that table. You want to narrow your selection to a manageable subset, and you want to see the result in a certain sequence. It's also helpful to have the query engine summarize and analyze your data.

The WHERE Clause

You use the WHERE clause to restrict the records returned by your query to those matching your criteria specified by the WHERE clause. The WHERE clause evaluates each record in the set of input records and decides whether the expression contained in the WHERE clause evaluates to True. If it does, the record is selected. If not, the record is omitted from the query's result set.

While it's beyond the scope of this chapter to give a complete tutorial on SQL, the following representative examples point out some of the unique features of the Microsoft Jet version of SQL.

See Also For more information about SQL syntax, see Appendix B, "SQL Reference."

Selecting Matching Values
SELECT Products.*  FROM Products WHERE Products.CategoryID = 2;

For each record in the Products table where the CategoryID field is equal to the literal value 2, the record is selected.

Selecting by Using Partial String Matching
SELECT Products.*  FROM Products WHERE Products.ProductName Like 'CH*';

Microsoft Jet uses partial match (or "wildcard") characters with the Like operator that are different from those used in most SQL dialects. The asterisk (*) character matches zero or more characters and is equivalent to the percent (%) character in ANSI SQL. The other Microsoft Jet partial match characters are the question mark (?), which matches any character in a single field, and the number sign (#), which matches any digit in a single field.

Some SQL dialects require you to enclose text literals within single quotation marks. However, Microsoft Jet accepts literals enclosed in either single or double quotation marks.

Selecting Boolean Values
SELECT Products.*  FROM Products WHERE Products.Discontinued = True;

Microsoft Jet stores Boolean values as either – 1 or 0. The constant value True is equal to – 1; False is equal to 0. You can substitute Yes for True and No for False within the text of the SQL statement. Note, however, that if you refer to Boolean recordset values within your VBA code, you must use the values – 1 for True and 0 for False, because Yes and No aren't recognized there.

Selecting by Using Date Literals and the Between...And Operator
SELECT Orders.*  FROM Orders WHERE Orders.OrderDate Between #3/1/96# And #6/30/96#;

The convention Microsoft Jet uses to search for DATETIME values is to enclose literal search values in number signs (#). The literal either can include the date only, as in the following example, or it can be fully qualified with the date and time:

SELECT Orders.*  FROM Orders  WHERE Orders.OrderDate > #5/24/96#;

Note that this date literal must always be expressed in MM/DD/YY order. To avoid the ambiguity of the meaning of stored queries, Microsoft Jet doesn't follow the international date format settings specified in the user's Control Panel.

Selecting by Using the IN Clause and a Value List
SELECT Orders.*  FROM Orders WHERE Orders.ShipCity IN ('London','Madrid','Rome');

The previous code selects a single value in the underlying table from a list of literal values. The IN clause can be combined with NOT:

...WHERE Orders.ShipVia NOT IN (1,3);
Combining More Than One Expression
SELECT Orders.*  FROM Orders WHERE Orders.OrderDate > #3/1/96# AND Orders.ShipVia = 2;

A WHERE clause can consist of more than one Sub procedure that combines various AND and OR clauses. The truth of the entire WHERE clause is evaluated by using standard rules of Boolean logic. You can use parentheses with the Sub procedure to ensure the order of evaluation. Note, however, that the absence or presence of parentheses doesn't affect the order in which joins are performed; parentheses affect only the order in which Boolean expressions are evaluated.

Selecting Unique Values
SELECT DISTINCT Employees.Title FROM Employees;

Although not related to the WHERE clause, you can use the DISTINCT predicate of the SELECT clause to limit output records to specific unique combinations of output fields. In the previous example, one occurrence of each unique Title field value used in the Employees table is retrieved. If the output contains more than one field, only records unique across all selected fields are output, as in the following example:

SELECT DISTINCT  Employees.Title,  Employees.TitleOfCourtesy FROM Employees;
Selecting Unique Records

The DISTINCTROW predicate omits data based on entire duplicate records, not just duplicate fields. For example, you could create a query that joins the Customers and Orders tables on the CustomerID field. The Customers table contains no duplicate CustomerID fields, but the Orders table does because each customer can have many orders. The following SQL statement shows how you can use DISTINCTROW to produce a list of companies that have at least one order but without any details about those orders:

SELECT DISTINCTROW CompanyName FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID ORDER BY CompanyName;

If you omit DISTINCTROW, this query produces multiple rows for each company that has more than one order. DISTINCTROW has an effect only when you select fields from some, but not all, of the tables used in the query.

Note DISTINCTROW is ignored if your query includes only one table, or if you output fields from all tables.

Cc966377.accicon(en-us,TechNet.10).gif

Microsoft Access Users The DISTINCTROW predicate is the SQL equivalent of setting the Microsoft Access UniqueRecords property in a query to True. In Microsoft Access 97, the default setting for the UniqueRecords property in a query is False, which eliminates DISTINCTROW from the query's SQL statement; in prior versions of Microsoft Access, the default setting for the UniqueRecords property is True, which adds DISTINCTROW to the query's SQL statement. Three types of queries are affected by this change: UPDATE queries, DELETE queries, and SELECT queries.

  • UPDATE queries If you run an UPDATE query that you imported from a Microsoft Access 2.0 or 95 database, or if you run an existing UPDATE query in a database that you converted to Microsoft Access 97 from an earlier version, you may receive the following error message: "Operation must use an updatable query."

    To resolve this error message, set the query's UniqueRecords property to False.

  • DELETE queries When a DELETE query contains more than one table (for example, a query that deletes duplicate records from one of the tables), the UniqueRecords property must be set to True for all versions of Microsoft Access. However, because the default value for the UniqueRecords property is False in Microsoft Access 97, you must set the value of this property manually when you create a new DELETE query in Microsoft Access 97.

  • SELECT queries Some SELECT queries that are updatable in Microsoft Access 2.0 or 95 may display the following message in the status bar when you try to update them in Microsoft Access 97: "This recordset is not updatable."

    To resolve this error, set the query's UniqueRecords property to False.

Selecting by Using Nested Subqueries
SELECT Orders.*  FROM Orders WHERE Orders.CustomerID IN  (SELECT CustomerID FROM Customers WHERE City = 'London');

This example matches each CustomerID value in the Orders table against a list that's created by selecting the CustomerID values of all customers whose city is London. Although this is a standard SQL query, and legal in Microsoft Jet SQL, this particular search may be executed more efficiently by directly joining the Orders table to the Customers table.

Selecting by Using Correlated Subqueries
SELECT  T1.LastName,  T1.FirstName,  T1.Title,  T1.Salary FROM Employees AS T1 WHERE T1.Salary >= (SELECT Avg(Salary)  FROM Employees  WHERE Employees.Title = T1.Title) ORDER BY T1.Title;

A correlated subquery is evaluated once for each record processed by the main SELECT query. In this example, the Salary field of each input record in the Employees table (here given the alias name T1) is compared to the results of a sub-SELECT query. The sub-SELECT query uses the value in the main Title field as a selection criterion. Because the Title field in the main Employees table can change for each record processed, the sub-SELECT query must be re-executed for each record of the main table that's processed. For example, if there are three employees with the title of Sales Representative, and the average of their three salaries is equal to $100,000 dollars, the main query will return only employees whose salary is greater than or equal to $100,000.

In addition to the features shown in the previous examples, Microsoft Jet supports a variety of other predicates for subqueries, including EXISTS, ANY, and ALL.

Ordering the Result Set

Microsoft Jet SQL uses the ORDER BY clause to present the selected records in the order you designate:

SELECT Products.*  FROM Products  ORDER BY UnitPrice DESC, ProductName;

In this example, records are retrieved in descending order by unit price. All records that match a given price are put in ascending order based on the values in the ProductName field. Although it's legal to use the ASC reserved word with an ORDER BY clause, it's not required. If ASC is omitted, the default ascending order is used.

If an ORDER BY clause is used in combination with a WHERE clause, it follows the WHERE clause:

SELECT Orders.*  FROM Orders  WHERE Orders.OrderDate >= #4/1/96#  ORDER BY Orders.EmployeeID;

The ORDER BY clause can contain an expression as well as a field in the underlying table, as in this example:

SELECT  Orders.OrderID,  Orders.RequiredDate,  IIf(ShipCountry = 'USA',RequiredDate+5,RequiredDate+10)  AS PastDueDate FROM Orders ORDER BY  IIf(ShipCountry = 'USA',RequiredDate+5,RequiredDate+10);

It's also possible to specify the sort fields by their field positions rather than by their names, as in the following example:

SELECT  Employees.EmployeeID,  Employees.LastName,  Employees.FirstName FROM Employees ORDER BY 3, 1;

This SELECT statement orders the query output by the third and first fields. In this case, the sort fields are the FirstName and EmployeeID fields. This feature is also useful with UNION queries, which are discussed later in this chapter.

Summing and Analyzing

Microsoft Jet supports the standard SQL aggregate functions Count, Sum, Avg, Max, and Min to perform table-level or group-level totaling:

SELECT  Count(Products.UnitPrice) AS CountOfUnitPrice,  Sum(Products.UnitPrice) AS SumOfUnitPrice,  Avg(Products.UnitPrice) AS AvgOfUnitPrice,  Min(Products.UnitPrice) AS MinOfUnitPrice,  Max(Products.UnitPrice) AS MaxOfUnitPrice FROM Products;

The previous query retrieves a single record that collects statistics on all products in the table. It can be combined with a WHERE clause:

SELECT  Count(Products.UnitPrice) AS CountOfUnitPrice,  Avg(Products.UnitPrice) AS AvgOfUnitPrice FROM Products WHERE Products.SupplierID = 6;

If no records match the selection criteria, the output is still produced. The value of any Count aggregate fields is zero; the value of the other aggregate types is Null.

You use the SQL function Count(fieldname) to tally occurrences of non-Null values in a specified field. A special form of Count is Count(*), which counts the number of records. As discussed in the section "Common Pitfalls" later in this chapter, Count(*) should be used whenever possible, because this expression can sometimes be executed more quickly by using the Rushmore query-optimization technology.

See Also For information about how to optimize queries used with Open Database Connectivity (ODBC) data sources, see Chapter 8, "Accessing External Data," and Chapter 9, "Developing Client/Server Applications."

The argument to these aggregate functions can be an expression as well as a field name, as in the following example:

SELECT Max(UnitPrice*Quantity) AS MaxExtPrice FROM [Order Details];

In addition to these standard aggregate functions, Microsoft Jet supports the SQL aggregate functions First and Last, which retrieve the first and last values of the set of records, and the StDev, StDevP, Var, and VarP statistical and financial functions. Note that because the StDev, StDevP, Var, and VarP functions are Microsoft Jet-specific, they aren't likely to be supported on any other client/server DBMS, and so will have to be evaluated locally.

Aggregate Values and GROUP BY

Aggregate values can be calculated at levels other than for the entire table. The SQL GROUP BY clause is used to create one output record per each group of unique values, and optionally, to produce summary values for selected fields. GROUP BY used alone creates an output record that specifies the unique values:

SELECT  Products.CategoryID,  Products.SupplierID FROM Products GROUP BY  Products.CategoryID,  Products.SupplierID;

The previous query lists the unique combinations of CategoryID and SupplierID, and produces the same results as:

SELECT DISTINCT  Products.CategoryID,  Products.SupplierID FROM Products;

As in other SQL dialects, any output fields in a GROUP BY query must be either part of an aggregate function or be in the GROUP BY clause.

When an aggregate function is combined with a GROUP BY clause, one record of output is produced for each unique value in the GROUP BY clause. The aggregate totals are accumulated and output for each record, and then reset for the next group, as in this example, which finds the oldest birth date for each group of employees that have the same title:

SELECT  Employees.Title,  Min(Employees.BirthDate) AS MinOfBirthDate FROM Employees GROUP BY Employees.Title;

You can also sort the output of the GROUP BY query by adding an ORDER BY clause, as in this example, which finds the most recently hired employee in each country/region:

SELECT  Employees.Country,  Max(Employees.HireDate) AS MaxOfHireDate FROM Employees GROUP BY Employees.Country ORDER BY Max(Employees.HireDate) DESC;
Applying Selection After Grouping with HAVING

You use the HAVING clause to apply selection criteria after data aggregation has been performed. Records are selected and summarized, and then only those meeting the conditions specified in the HAVING clause are displayed:

SELECT  Products.CategoryID,  Sum(Products.UnitsInStock) AS SumOfUnitsInStock FROM Products GROUP BY Products.CategoryID HAVING Sum(Products.UnitsInStock) > 400;

The HAVING clause determines which groups created by the GROUP BY clause are returned in the query output. Note that although the HAVING clause and the WHERE clause are similar, they aren't interchangeable. HAVING is used only in conjunction with the GROUP BY clause.

Joining Tables

The power of a relational database system is evident when you combine results from one table with those from another. Microsoft Jet is flexible in the type and format of the SQL statements used to create a relational join between tables. In addition to joins in which the linking criteria is specified in the WHERE clause, Microsoft Jet SQL can use the JOIN clause to specify inner joins, left and right outer joins, and self-joins, as well as non-equi-joins, in which the linking criterion isn't a match in values between two tables, but a generic conditional expression that evaluates to True. Microsoft Jet also supports the UNION statement, which concatenates the results of two or more SELECT statements.

Joins Specified in the WHERE clause

With Microsoft Jet, you can create relational joins in SQL by specifying the linking condition in the WHERE clause:

SELECT  Products.ProductID,  Products.ProductName,  Categories.CategoryID,  Categories.CategoryName FROM Categories, Products WHERE Products.CategoryID = Categories.CategoryID;

This query uses two fields each from the Products and Categories tables and selects records from each table where the CategoryID field in Products matches the CategoryID field in Categories.

The INNER JOIN

The SQL query produced by the query design grid in Microsoft Access uses the INNER JOIN clause to join two tables. Here is the same query as produced by Microsoft Access:

SELECT  Products.ProductID,  Products.ProductName,  Categories.CategoryID,  Categories.CategoryName FROM Categories  INNER JOIN Products  ON Categories.CategoryID = Products.CategoryID;

The INNER JOIN names the linking criterion used to find matches between the two tables. This is the preferred format for specifying joins with Microsoft Jet, for two reasons. First, this format makes the join criteria explicit, rather than inferring it from the condition in the WHERE clause. Second, Microsoft Jet requires the use of this format in order for the results of this join to be updatable.

See Also For a full discussion of query updatability, see Chapter 5, "Working with Records and Fields."

Many people also prefer this format because it's self-documenting, unlike the implicit join specified in the WHERE clause. The WHERE clause is then reserved for selection criteria, rather than doing dual-duty as a join specifier:

SELECT  Products.ProductID,  Products.ProductName,  Categories.CategoryID,  Categories.CategoryName FROM Categories  INNER JOIN Products  ON Categories.CategoryID = Products.CategoryID WHERE Products.CategoryID IN (1,3);

A join can specify more than one linking condition. The NorthwindTables sample database contains the saved query PreferredSuppliers, which includes all suppliers that provide a product that has sold more than $10,000 worth of inventory. The following query is based on the PreferredSuppliers query and the Products table:

SELECT  PreferredSuppliers.ProductID,  PreferredSuppliers.SupplierID,  Products.CategoryID,  Products.ProductName FROM PreferredSuppliers  INNER JOIN Products  ON (PreferredSuppliers.SupplierID = Products.SupplierID)  AND (PreferredSuppliers.ProductID = Products.ProductID);

Complex join conditions can be specified. The following example joins six tables:

SELECT  Orders.OrderID,  Employees.EmployeeID,  Customers.CompanyName,  [Order Details].ProductID,  Products.ProductName,  Categories.CategoryName FROM  Categories INNER JOIN  (Products INNER JOIN  (Employees INNER JOIN  (Customers INNER JOIN  (Orders INNER JOIN [Order Details]  ON Orders.OrderID = [Order Details].OrderID)  ON Customers.CustomerID = Orders.CustomerID)  ON Employees.EmployeeID = Orders.EmployeeID)  ON Products.ProductID = [Order Details].ProductID)  ON Categories.CategoryID = Products.CategoryID;
The Left and Right Outer Joins

The INNER JOIN clause retrieves records only when there is a match in both tables. In some cases, you may want to retrieve the values from one table even when there are no matching values in the other joined table. For example, you want to create a join from the Orders table to the Employees table, but you don't want to exclude orders from the result set simply because there is no corresponding record in the Employees table:

SELECT  Orders.*,  Employees.LastName FROM Employees  RIGHT OUTER JOIN Orders  ON Employees.EmployeeID = Orders.EmployeeID;

When you specify a left or right outer join, Microsoft Jet retrieves all records from one table even though there are no matches in the other table. In this example, all records from the Orders table are retrieved even when the value of the EmployeeID field is unknown.

Note that the word "OUTER" is optional in Microsoft Jet SQL syntax, because a left or right join implies an outer join.

Whether the join is considered a left or a right join is determined from the order of the tables listed in the join clause. There is no effect on efficiency, or in the actual records returned. The previous query can be rewritten as follows, and the same records would be returned:

SELECT  Employees.LastName,  Orders.* FROM Orders  LEFT JOIN Employees  ON Orders.EmployeeID = Employees.EmployeeID;

Microsoft Jet treats a right join as if it were a left join with the tables reversed.

Outer Joins and Null Values

Microsoft Jet returns the special value Null in all fields for the records in the "outer" table in which no match is found. You can take advantage of this feature by testing for the presence of Null values in the outer table. For example, the NorthwindTables database contains a query named SuppliersInUSA, which returns records for those suppliers located in the United States. Suppose you want to find all the products in the Products table whose suppliers aren't in the result set for the SuppliersInUSA query. One way is to create a nested sub-SELECT query:

SELECT Products.*  FROM Products WHERE Products.SupplierID NOT IN (SELECT SupplierID FROM SuppliersInUSA;);

This query creates a list of products from suppliers who aren't located in the United States.

Another way to achieve the same result makes use of the fact that nonmatching records are Null. The following example creates a left join between Products and SuppliersInUSA, and then tests for a Null value in the SupplierID field in SuppliersInUSA. SupplierID is the primary key of the Suppliers table and can't normally be Null, either in the Suppliers table or in the SuppliersInUSA query. Therefore, the presence of a Null value in the result set of the left join implies that there is no join between Products and SuppliersInUSA, which is the condition that should be detected:

SELECT Products.* FROM Products LEFT JOIN SuppliersInUSA ON Products.SupplierID = SuppliersInUSA.SupplierID WHERE SuppliersInUSA.SupplierID Is Null;
Self-Joins

It's possible to create a join between a table and a second instance of the same table. A common example is an Employees table in which the ReportsTo field contains EmployeeID values found in the Employees table itself.

The following example joins the Employees table to itself. The ReportsTo field in the first instance of the Employees table is linked to the EmployeeID field in the second instance. The second instance of the Employees table is given the alias Supervisors:

SELECT  Employees.EmployeeID,  Employees.LastName,  Supervisors.FirstName & ' ' & Supervisors.LastName AS SupervisorName FROM Employees  INNER JOIN Employees AS Supervisors  ON Employees.ReportsTo = Supervisors.EmployeeID;

In this example, a calculated field called SupervisorName is created. Note the use of the & concatenation operator, which is used to combine the output of the FirstName and LastName fields in the Supervisors table.

Because an inner join was specified, if the value in the ReportsTo field of any given record isn't also found in the Employees table, the record isn't selected. If you want to select the record even when the self-join doesn't find a match, you can specify a left join, as shown in the earlier examples in this chapter.

The following revised query specifies a left join. The calculated SupervisorName field has been revised to anticipate that certain records may have a Null value in the ReportsTo field (perhaps indicating that the employee is a supervisor, and doesn't report to another employee in the Employees table). The calculated field returns the specified value "No Supervisor" for those records in which the ReportsTo field is Null:

SELECT  Employees.EmployeeID,  Employees.LastName,  IIf(IsNull(Employees.ReportsTo), 'No Supervisor', Supervisors.FirstName & ' ' & Supervisors.LastName)  AS SupervisorName FROM Employees  LEFT JOIN Employees AS Supervisors  ON Employees.ReportsTo = Supervisors.EmployeeID;

Note This query uses the IIf (Immediate If) function in the calculated expression, which creates the field SupervisorName. This is one of many built-in functions understood by Microsoft Jet. Other functions you can use include string-manipulation functions and date functions. While these functions can be extremely useful, they should be used with care. Because they aren't available on all back-end database systems, they must be evaluated locally by the Jet database engine running on the user's computer. If calculated fields using these functions are part of the selection criteria, a table scan (a serial read) of all records in the result set must be performed so that the condition can be evaluated locally. Rather than adding a calculated field directly to the query itself, consider creating calculated fields in the form or report to display the query results.

Joining Tables and Queries

In addition to joins based on permanent tables, Microsoft Jet supports joins between tables and saved queries, or even between two queries. Queries can be based on other queries, which can further be based on other queries, and so on. Microsoft Jet automatically resolves all references to queries and tables.

Microsoft Jet QueryDef objects can be treated, for the most part, as if they were tables in the database. A QueryDef object performs the function of a view in SQL. QueryDef and TableDef objects occupy the same "name space" within a Microsoft Jet database. This means that you can't have a TableDef object and a QueryDef object with the same name, for example. Generally, wherever you can use a base table, you can use a query based on that table instead.

Suppose you want to find the percentage of your total sales of certain products that were generated by particular customers. One way to do this is to create the four queries shown in this section. The first query totals orders by customer and product:

SELECT  Orders.CustomerID,  [Order Details].ProductID,  Sum((UnitPrice*Quantity) -(Discount*(UnitPrice*Quantity))) AS ExtPrice  FROM Orders  INNER JOIN [Order Details]  ON Orders.OrderID = [Order Details].OrderID  GROUP BY Orders.CustomerID, [Order Details].ProductID;

Save this query as a permanent QueryDef object in the database with the name SelectCustomersOrdersTotals. You can now treat this query as if it were a table. The sample output it produces may be:

Customer

Product

ExtPrice

ALFKI

Aniseed Syrup

$ 60.00

ALFKI

Grandma's Boysenberry Spread

$399.95

ALFKI

Rössle Sauerkraut

$774.95

ALFKI

Chartreuse verte

$377.75

ALFKI

Spegesild

$ 23.75

Now you have to find the order totals for each ProductID value. Create a permanent QueryDef object called SelectOrdersTotals with the following SQL statement:

SELECT  [Order Details].ProductID,  Sum((UnitPrice*Quantity) -Discount) AS ExtPrice  FROM Orders  INNER JOIN [Order Details]  ON Orders.OrderID = [Order Details].OrderID  GROUP BY [Order Details].ProductID;

Part of the output produced by this query may be:

Product

ExtPrice

Chai

$14,274.65

Chang

$18,554.70

Aniseed Syrup

$ 3,079.80

Chef Anton's Cajun Seasoning

$ 9,423.30

Chef Anton's Gumbo Mix

$ 5,800.40

Now you can create a third QueryDef object called SelectPercentOfTotal, which joins SelectOrdersTotals with SelectCustomersOrdersTotals:

SELECT  SelectCustomersOrdersTotals.CustomerID,  SelectCustomersOrdersTotals.ProductID,  SelectCustomersOrdersTotals.ExtPrice AS CustTotal,  SelectOrdersTotals.ExtPrice AS ProdTotal,  Format(CustTotal/ProdTotal, '#.###') AS PercentOfTot  FROM SelectOrdersTotals  INNER JOIN SelectCustomersOrdersTotals  ON SelectOrdersTotals.ProductID = SelectCustomersOrdersTotals.ProductID;

Part of the output from this query may be:

Customer

Product

CustTotal

ProdTotal

PercentOfTot

Alfreds Futterkiste

Aniseed Syrup

$ 60.00

$ 3,079.80

.019

Alfreds Futterkiste

Grandma's Boysenberry Spread

$399.95

$ 7,344.63

.054

Alfreds Futterkiste

Rössle Sauerkraut

$774.95

$26,864.35

.029

Alfreds Futterkiste

Chartreuse verte

$377.75

$13,148.80

.029

Alfreds Futterkiste

Spegesild

$ 23.75

$ 6,142.28

.004

Finally, you can join this third QueryDef object back to the Customers and Products tables to pick up the CompanyName and ProductName fields:

SELECT  SelectPercentOfTotal.*,  Products.ProductName,  Customers.CompanyName  FROM (SelectPercentOfTotal  INNER JOIN Customers  ON SelectPercentOfTotal.CustomerID = Customers.CustomerID)  INNER JOIN Products  ON SelectPercentOfTotal.ProductID = Products.ProductID;
Non-Equi-Joins

So far, our join conditions have been based on matching values in selected fields in two or more tables. However, with Microsoft Jet you can also create join conditions based on tests other than equality. For example, you can create joins based on a range of acceptable values.

Suppose you have a table called PerformanceGrade that contains the following values. (This table exists in the NorthwindTables sample database.)

PerformanceGradeKey

LowRange

HighRange

0

$ 0.00

$ 100,000.00

1

$100,000.01

$ 200,000.00

2

$200,000.01

$ 300,000.00

3

$300,000.01

$ 400,000.00

4

$400,000.01

$9,999,999.99

You want to assign a PerformanceGradeKey value to each employee's sales. Employees with sales between 0 and $100,000 receive a 0, those with sales between $100,000.01 and $200,000 receive a 1, and so on.

The first step is to create a QueryDef object that sums up the sales by employee. Call this QueryDef object SelectEmployeeSales:

SELECT  Orders.EmployeeID,  Sum((UnitPrice*Quantity) -Discount) AS ExtPrice FROM Orders  INNER JOIN [Order Details]  ON Orders.OrderID = [Order Details].OrderID GROUP BY Orders.EmployeeID;

Sample output from this query may be:

Employee

ExtPrice

Davolio, Nancy

$202,126.72

Fuller, Andrew

$177,738.71

Leverling, Janet

$213,035.35

Peacock, Margaret

$250,161.70

Buchanan, Steven

$ 75,559.95

Suyama, Michael

$ 78,188.95

King, Robert

$141,283.04

Callahan, Laura

$133,286.43

Dodsworth, Anne

$ 82,956.70

Now this saved QueryDef object has to be joined to the PerformanceGrade table. The grade assigned is determined by the high and low ranges, and the actual sales of each employee. Here is the SQL statement that produces the desired result:

SELECT  SelectEmployeeSales.EmployeeID,  SelectEmployeeSales.ExtPrice,  PerformanceGrade.PerformanceGradeKey FROM SelectEmployeeSales, PerformanceGrade WHERE (SelectEmployeeSales.ExtPrice Between PerformanceGrade.LowRange And PerformanceGrade.HighRange) ORDER BY PerformanceGrade.PerformanceGradeKey;

As you can see from the sample output shown, Microsoft Jet is able to link values from the SelectEmployeeSales query to the PerformanceGrade table even though neither table has common fields or shares common values. The join is based entirely on the condition specified in the WHERE clause.

Employee

ExtPrice

PerformanceGradeKey

     

Dodsworth, Anne

$ 82,956.70

0

Suyama, Michael

$ 78,188.95

0

Buchanan, Steven

$ 75,559.95

0

Callahan, Laura

$133,286.43

1

King, Robert

$141,283.04

1

Fuller, Andrew

$177,738.71

1

Peacock, Margaret

$250,161.70

2

Leverling, Janet

$213,035.35

2

Davolio, Nancy

$202,126.72

2

Be aware that Microsoft Jet can't perform its normal optimization techniques on non-equi-join queries.

UNION Queries

You often have to treat the results of one or more queries, or the contents of one or more tables, as if they were in the same table. You may want to create a list of names and addresses that combines entries from both the Customers and the Suppliers tables. Microsoft Jet supports the UNION statement, which combines the results of two or more SELECT queries.

For example, the combined Customer and Supplier list can be created with the following query:

SELECT  Customers.CompanyName,  Customers.Address,  Customers.City,  Customers.Region,  Customers.PostalCode FROM Customers UNION SELECT  Suppliers.CompanyName,  Suppliers.Address,  Suppliers.City,  Suppliers.Region,  Suppliers.PostalCode FROM Suppliers;

The individual SELECT queries are combined by the UNION statement. Each SELECT statement can be as complex as necessary, and can include multiple-table joins. Each SELECT statement can also use GROUP BY to sum values. To specify the order in which the results of all the individual SELECT statements are returned, you can add an optional ORDER BY clause after the last SELECT statement.

The number of fields in each SELECT statement must match, but the names of the fields can differ. The data types of the matching fields don't have to match exactly. For example, an integer field can be united with a floating-point field.

When the output of the entire UNION query is referred to, the field names for the first SELECT statement are used.

Usually, Microsoft Jet hides records that contain entirely duplicated data in a UNION query. If you want to include duplicate records in the output, use the UNION ALL clause instead of UNION.

If you're joining two tables or queries that have the same structure, you can use a special form of the UNION statement with which you can specify the tables you want to combine, rather than listing the fields you want to select. For example, instead of listing the fields as in the earlier example that combined values from the Customers and Suppliers tables, you could create one query called SelectCustomerNameAndAddress

SELECT  Customers.CompanyName,  Customers.Address,  Customers.City,  Customers.Region,  Customers.PostalCode FROM Customers;

and another called SelectSupplierNameAndAddress:

SELECT  Suppliers.CompanyName,  Suppliers.Address,  Suppliers.City,  Suppliers.Region,  Suppliers.PostalCode FROM Suppliers;

The following UNION query combines the two queries above without explicitly specifying the names of the fields:

TABLE SelectSupplierNameAndAddress UNION TABLE SelectCustomerNameAndAddress;

You can't explicitly include Memo or Long Binary (OLE Object) fields in the output SELECT statement of a UNION query. For example, the following query results in an error:

SELECT Table1.Memo1 FROM Table1 UNION  SELECT Table2.Memo2 FROM Table2;

If it's sufficient for your purposes to include only the first 255 characters of the Memo field in the output, one alternative is to select an expression based on the Memo field:

SELECT Left(Table1.Memo1,255) AS FirstPartOfMemo FROM Table1 UNION SELECT Left(Table2.Memo2,255) AS FirstPartOfMemo FROM Table2;

Note The results of a UNION query are never updatable.

Other Query Features

Microsoft Jet includes several other unique query features that can solve common problems. With crosstab queries, you can perform spreadsheet-like cross-tabulations, and by using the TOP N or TOP N PERCENT predicates, you can filter large amounts of data based on the desired top number or percentage of values. You can also supply variable parameter input at run time by using parameter queries.

Crosstab Queries

A crosstab query summarizes tabular data and places it into a columnar format. In a crosstab query, matching values can be counted, summed, averaged, compared to other values, and so on.

Suppose you want to know the average salary paid for various employee classifications in different countries. One way to do this is to create a standard SELECT query that uses the GROUP BY clause for the Country and Title fields, and averages the values in the Salary field:

SELECT  Employees.Country,  Employees.Title,  Avg(Employees.Salary) AS AvgOfSalary FROM Employees GROUP BY Employees.Country, Employees.Title;

This query results in the following table.

Country

Title

AvgOfSalary

UK

Sales Manager

$59,600.00

UK

Sales Representative

$47,500.00

USA

Inside Sales Coordinator

$39,000.00

USA

Sales Representative

$48,960.00

USA

Vice President, Sales

$85,000.00

It's difficult, however, to make certain kinds of comparisons. The Country field information is repeated for each row, and in order to compare the average salary for various titles across countries, you must match numbers that are physically distant from one another on the page. You may easily overlook the fact that no sales manager is listed for the United States.

A crosstab query solves this problem by organizing the data so that the summarized information is placed in contiguous columns across the page. The following crosstab query produces the same information as the first query, but in a much more readable format:

TRANSFORM Avg(Employees.Salary) AS AvgOfSalary SELECT Employees.Title FROM Employees GROUP BY Employees.Title PIVOT Employees.Country;

The contents of the Country field become column headings. The Title field (specified in the SELECT clause) becomes the row heading and the values are the average of the Salary field.

Title

UK

USA

Inside Sales Coordinator

 

$39,000.00

Sales Manager

$59,600.00

 

Sales Representative

$47,500.00

$48,960.00

Vice President, Sales

 

$85,000.00

To support crosstab operations, Microsoft Jet includes these two extensions to the standard SQL language:

  • The TRANSFORM statement. Specifies that the query is a crosstab query. The first statement following the TRANSFORM statement must be an aggregate function, such as Avg, Sum, Min, Max, or Count, which summarizes the selected data.

  • The PIVOT clause. Specifies the field or expression used to create the column headings in the query's result set.

By default, Microsoft Jet presents the summarized columns in alphabetical order from left to right. Missing values are returned as Null. In the previous example, because no sales managers exist in the United States, there is no salary to average. The value in the USA column for Sales Manager is Null.

Column headings are based on the actual values contained in the field specified by the PIVOT clause. If values are missing, no column is created. For example, if your company has offices in Germany, but no employees were selected whose country/region is Germany, no column would be created for Germany.

You can force Microsoft Jet to create columns in an order other than alphabetical and force the creation of columns even if there is no data for that column value by using the optional IN clause with the PIVOT clause. The previous crosstab query can be revised as follows:

TRANSFORM Avg(Employees.Salary) AS AvgOfSalary  SELECT Employees.Title  FROM Employees  GROUP BY Employees.Title  PIVOT Employees.Country IN ('UK','Germany','France','USA');

This query results in the following table.

Title

UK

Germany

France

USA

Inside Sales Coordinator

     

$39,000.00

Sales Manager

$59,600.00

     

Sales Representative

$47,500.00

   

$48,960.00

Vice President, Sales

     

$85,000.00

The target of the SELECT statement used to create the column headings can be an expression as well as an actual field name. A common use of this feature is to create groupings that don't exist in the data. For example, you can group all orders into month groupings in the following manner:

TRANSFORM Count(*)  SELECT Orders.EmployeeID FROM Orders GROUP BY Orders.EmployeeID PIVOT Format(OrderDate,'mmm');

The following table illustrates a portion of the columns and rows returned by this query.

Employee

Apr

Aug

Dec

Feb

Davolio, Nancy

9

8

17

12

Fuller, Andrew

19

4

12

5

Leverling, Janet

13

6

13

14

The column headings in this query are created with the Format function, which turns the value of the OrderDate field into a three-letter month abbreviation. All orders within a given month (even if they were placed in different years) are summarized together.

Because the IN clause wasn't used with the PIVOT clause, the months are displayed alphabetically. A more complete query, which combines a WHERE clause to limit the selection to orders within a given year and displays the columns in calendar order, is as follows:

TRANSFORM Count(*)  SELECT Orders.EmployeeID  FROM Orders  WHERE Orders.OrderDate Between #1/1/96# And #12/31/96#  GROUP BY Orders.EmployeeID  PIVOT Format(OrderDate,'mmm') IN ('Jan','Feb','Mar','Apr', 'May','Jun','Jul','Aug','Sep','Oct','Nov','Dec');

The following table shows a portion of the result from this query.

Employee

Jan

Feb

Mar

Apr

Davolio, Nancy

1

2

5

3

Fuller, Andrew

3

2

3

3

Leverling, Janet

9

7

5

3

TOP N and TOP N PERCENT Predicates

Although you can use the WHERE and HAVING clauses to filter the selection of records, sometimes this isn't sufficient. For example, you may want to select all records where the state is CA, but only see the orders for the top 10 customers. Microsoft Jet provides TOP N and TOP N PERCENT predicates to limit the presentation of records after they're selected.

TOP N Predicate

You can use the TOP N predicate to specify that your query return only a specific number of records to your program:

SELECT TOP 5  Employees.EmployeeID,  Employees.Salary FROM Employees ORDER BY Employees.Salary;

The TOP criteria is applied after the records are selected. TOP doesn't necessarily imply "higher" or "more." Think of it as the "first n records" of the result set. For example, the previous query actually produces a list of employees with the lowest salary, not the highest, because the records are sorted in ascending order by salary. If you want to find the employees with the highest salary, sort the records in descending order based on salary:

SELECT TOP 5  Employees.EmployeeID,  Employees.Salary FROM Employees ORDER BY Employees.Salary DESC;

If more than one record ties for the nth value, all of the tied values are displayed. If the fifth, sixth, and seventh highest salaries are the same, the previous query retrieves seven records, not five.

Note that the selection of records is based on the entire output set, even if the query uses the GROUP BY clause to group like values. It's not possible, for example, to retrieve the "top 5 records within each group," or the "top 10 salesmen within each region."

TOP N PERCENT Predicate

TOP N PERCENT works much the same as the TOP N predicate, except the number of records returned isn't fixed. A selected percentage of the records are retrieved:

SELECT TOP 10 PERCENT  Products.UnitsInStock,  Products.ProductID,  Products.ProductName FROM Products ORDER BY Products.UnitsInStock DESC, Products.ProductName DESC;

This query retrieves the 10 percent of products that have the highest value for the UnitsInStock field.

Note If you're working with a back-end server such as SQL Server, be cautious about using the TOP N PERCENT predicate. Because most servers don't natively support this predicate, the output of this query must be evaluated on the local machine. It's possible that a large number of records will need to be returned from the server in order for Microsoft Jet to determine which records to select, which could cause performance to decrease.

Parameter Queries

With Microsoft Jet, you can specify one or more query parameters. The parameters are used to accept input values at run time from the user. This user-supplied data is then merged with a QueryDef object. Parameters are commonly used to supply selection criteria, but they can also be used to specify terms in an expression.

You declare parameters with the PARAMETERS declaration, which specifies the names and data types of the parameters. The following example takes an EmployeeID value as a parameter, which causes the query to return values for only the selected employee, and another parameter whose value will be multiplied by the employee's current salary to calculate a proposed new salary. The first parameter, EmployeeIDSelect, is used as part of the selection criteria; the second parameter, SalaryIncreaseFactor, is used as part of an expression:

PARAMETERS EmployeeIDSelect Long, SalaryIncreaseFactor Single; SELECT  Employees.LastName,  Employees.FirstName,  Employees.Salary,  CCur(Employees.Salary*SalaryIncreaseFactor) AS NewSalary FROM Employees WHERE Employees.EmployeeID = EmployeeIDSelect;

How does your program supply the parameter values? If you're working with Microsoft Access, you may be familiar with the way it prompts the user to supply missing parameter values.

This behavior isn't built into Microsoft Jet, however. It's a function performed by Microsoft Access, and then only when the parameter query is executed through the user interface. Microsoft Access prompts the user for parameter values for forms, reports, and datasheets based on parameter queries.

If you want to supply parameter values to an existing QueryDef object that's being executed programmatically, you have to do so explicitly, as in the following example. Assume that you have saved the parameter query shown previously as a permanent QueryDef object. In this example, strDbPath is the path to the NorthwindTables database, strQueryName is the name of the query under which you have saved the preceding SQL statement, lngEmpID is the Long value you supply for the EmployeeIDSelect parameter, and sngIncFact is the Single value you supply for the SalaryIncreaseFactor parameter.

For example, to increase the salary for the first employee in the Employees table by 10 percent, you would supply a value of 1 for EmployeeIDSelect and a value of 1.1 for SalaryIncreaseFactor:

Dim dbs As Database Dim qdf As QueryDef Dim rst As Recordset Dim strMsg As String  Set dbs = OpenDatabase(strDbPath) ' Return reference to parameter QueryDef object. Set qdf = dbs.QueryDefs(strQueryName)  ' Supply values for parameters. qdf.Parameters("EmployeeIDSelect") = lngEmpID qdf.Parameters("SalaryIncreaseFactor") = sngIncFact  ' Open recordset. Set rst = qdf.OpenRecordset() With rst If .RecordCount <> 0 Then MsgBox "The proposed new salary is: $" & !NewSalary ' Check whether recordset is updatable. If .Updatable Then ' Prompt to update salary. strMsg = "Would you like to update the salary for " _ & !FirstName & " " & !LastName & " from $" _ & !Salary & " to $" & !NewSalary & "?" ' Update salary field. If MsgBox(strMsg, vbYesNo) = vbYes Then .Edit !Salary = !NewSalary .Update End If End If End If End With

This method assumes that you know the names of the parameters at the time your code is written. One way to avoid the need to explicitly specify parameters is to iterate through all the parameters in the Parameters collection of the QueryDef object and prompt the user for values, thus emulating the behavior inherent in Microsoft Access. In the following example, strDbPath is the path to the database, and strQueryName is the name of the query:

Dim dbs As Database Dim qdf As QueryDef Dim rst As Recordset Dim prm As Parameter  Set dbs = OpenDatabase(strDbPath) Set qdf = dbs.QueryDefs(strQueryName)  For Each prm In qdf.Parameters prm.Value = InputBox("Enter parameter value: " & prm.Name) Next prm Set rst = qdf.OpenRecordset()

Although Microsoft Jet includes the PARAMETERS declaration in order for you to name and type your QueryDef object parameters explicitly, it actually treats any unrecognized term as a parameter. If you make a spelling error when specifying a field name, for example, Microsoft Jet treats the misspelled field name as a parameter.

It's a good idea to get into the habit of specifying the names and data types of your parameters rather than relying on the implicit parameter behavior of Microsoft Jet.

You can also use the Microsoft Jet PARAMETERS declaration to create a temporary QueryDef object that prompts the user to supply parameter values when your code executes. For more information and an example, see "Temporary QueryDef Objects" in Chapter 3, "Data Definition and Integrity."

Important You should not use a parameter query to search for Null values. A parameter of Null will never find a Null record because no two Null values match. If you want to find records that contain fields with Null values, you can use a non-parameter SELECT query. For example, the following SQL statement returns a list of all customers that don't have a fax number:

SELECT * FROM Customers WHERE Fax = Null;

See Also For a discussion of the various database object collections, including QueryDefs and Parameters, see Chapter 3, "Data Definition and Integrity."

Updating Your Data with Queries

You can use the dynamic selection and sorting capabilities of the query engine for more than static searches; after a result set is retrieved, the data can also be updated through a program such as Microsoft Access, or through a custom Visual Basic program.

See Also Chapter 5, "Working with Records and Fields," discusses the various types of Recordset objects that can be built from Microsoft Jet queries and how those Recordset objects can be manipulated through Visual Basic.

Single-Table Query Updatability

A query based on a single table can select some or all records, and some or all fields, from that table. In the simplest case, the query returns all records and all fields. For example:

SELECT *  FROM Customers;

Every record retrieved, and every field in every record, is accessible and subject to change (except calculated fields based on expressions). Every record can be deleted and new records can be inserted.

If you were to use Microsoft Access to open a datasheet based on this query, you would be able to freely browse forward and backward, making any changes you wanted (subject to security restrictions, validation rules, and referential integrity constraints). Any changes made to the Recordset object created by this QueryDef object would automatically be reflected in the underlying table.

A query can also select and sort records from the original table and remain updatable:

SELECT * FROM Customers WHERE Customers.ContactTitle = 'Owner' ORDER BY Customers.CompanyName;

An updatable single-table query such as this one can be useful in a data-entry situation in which you would like to present the records to the user in a certain order, or hide certain records based on the selection criteria.

One thing to be aware of, however, is that Microsoft Jet doesn't prevent the user from adding a new record through a QueryDef object that wouldn't have met the original selection criteria. For example, the previous query selects only records in which the value of the ContactTitle field is "Owner." The user can add a new record and specify a value other than "Owner" for the ContactTitle field, in which case the user would be adding a record that the query, if subsequently rerun, would no longer select. It's up to your application to enforce insertion restrictions such as this.

Single-Table Query Updatability Restrictions

Instead of selecting all fields with the asterisk (*) character, a query can select specific fields. For example:

SELECT  Customers.ContactName,  Customers.ContactTitle,  Customers.Address,  Customers.City,  Customers.Region,  Customers.PostalCode FROM Customers;

This technique can be useful for hiding certain fields from users, while still allowing them access to the information they need. The fields made available through a query such as this one are fully updatable. It may, however, be impossible to add a new record through this query, either because the fields not included in the output are specified as "required" at the table level, or because the primary key or foreign keys can't be created with default values.

Another restriction that applies to non-native tables is the requirement of a unique index. For example, Microsoft Jet requires that a primary key be defined for Paradox tables and that ODBC tables have a unique index. Queries based on tables without a unique index aren't updatable. The exceptions to this are Btrieve and xBase data sources, which don't require a unique index. For SQL Server, a table may be updatable if it has a timestamp field, even if it doesn't have a unique index.

Multiple-Table Query Updatability

In addition to allowing updates to single-table queries, Microsoft Jet supports updatable multiple-table joins. This feature is useful because it enables you to combine data from the main table with lookup information from other tables (a many-to-one join), or to join a master table with a related detail table (a one-to-many join) and still have an updatable result set.

Many-to-One Joins

The following example joins the Products table (the focus of the query) with the Suppliers table, which provides lookup information, including the supplier's company name and city:

SELECT  Products.*,  Suppliers.CompanyName,  Suppliers.City FROM Suppliers  INNER JOIN Products  ON Suppliers.SupplierID = Products.SupplierID;

The user can change data in any field from the Products table, including the SupplierID field, which links the Products table to the Suppliers table. If you change the value of the linking field through a datasheet or a form in a Microsoft Access application, you will also see that the corresponding lookup information from the "one" table is automatically retrieved and redisplayed. This technique is known as row fix-up or AutoLookup.

Although the focus of this many-to-one query is the Products table, it's also possible (though perhaps not desirable) to change the values in fields retrieved from the lookup table, such as the CompanyName or City field in the Suppliers table. The user may attempt to alter the CompanyName value for one record of the recordset created from this query, under the impression that the change will affect only the current record. However, because the value is actually stored in the Suppliers lookup table, the value is changed for every record in the recordset.

One-to-Many Joins

There is no logical difference between a "many-to-one" join, as described previously, and a "one-to-many" join, except from the point of view of the user. A one-to-many join is sometimes referred to as a master/detail relationship. A single record in the "one" or "master" table is related to one or more records in the "many" or "detail" table. Updatable multiple-table joins are especially useful with these types of one-to-many relationships.

The following query joins the Orders table to the Order Details table in a classic one-to-many relationship:

SELECT  Orders.*,  [Order Details].* FROM Orders  INNER JOIN [Order Details]  ON Orders.OrderID = [Order Details].OrderID;

Fields in the Orders table focus on the order itself: the customer who placed the order, the employee who took the order, the date the order was taken, and so on. Fields derived from the Order Details table specify the actual items that were ordered: the product's ID and pricing details. Just as with the many-to-one example shown previously, changes to fields from the "one" table on any given record are automatically made for all other records based on the same value in the "one" table. For example, if the user changes the CustomerID field, which is drawn from the "master" Orders table, for any given record, the change is automatically reflected in all other records for this same order.

Updatable multiple-table joins aren't limited to a single-level hierarchy. For example, the following query links from the Employees table to the Orders table to the Order Details table:

SELECT  Employees.EmployeeID,  Employees.LastName,  Orders.OrderID,  Orders.OrderDate,  [Order Details].ProductID FROM (Employees  INNER JOIN Orders  ON Employees.EmployeeID = Orders.EmployeeID)  INNER JOIN [Order Details]  ON Orders.OrderID = [Order Details].OrderID;

Fields from all three of these joined tables can be updated in the resulting recordset.

Inserting Records into a Multiple-Table Query

When you're inserting a record into a recordset based on a multiple-table join, records can be added to one, several, or all of the tables included in the join. Records from the "many" side of the join can be appended to the "many" table as needed. Records from the "one" side of the join can be added as well, as long as they don't violate any referential integrity constraints.

The following query joins the Products ("many") table to the Categories ("one") table:

SELECT  Categories.CategoryID,  Categories.CategoryName,  Products.ProductID,  Products.ProductName,  Products.SupplierID,  Products.CategoryID FROM Categories  INNER JOIN Products  ON Categories.CategoryID = Products.CategoryID;

If, while adding a new record to the Products table, the user specifies a CategoryID value that's already present in the Categories table, a new record is added only to the Products table. Row fix-up occurs; values from the corresponding record in the Categories table are retrieved.

If the primary key from the Categories table is included in the join, as in the previous example, it's possible to add a new CategoryID value to the recordset and have that new value automatically added to the Categories table.

In general, values for all required fields, and for the field or fields making up the primary key, must be supplied when you're inserting a new record into a recordset based on a query. The exception to this rule is for AutoNumber fields (also known as Counter fields). You must not supply an explicit value for AutoNumber fields. The value for AutoNumber fields is automatically assigned by Microsoft Jet.

Updatable One-to-Many Outer Joins

As discussed earlier in this chapter, an outer join selects all records from one table in a multiple-table join, while only selecting records with matching values from another table. For records in which there is no match in one table, artificial Null field values are supplied. Microsoft Jet allows you to "fill in the blanks" in these artificial Null records.

Consider the following outer join and a portion of its results:

SELECT  Customers.CompanyName, Customers.CustomerID, Orders.CustomerID, Orders.OrderID FROM Customers  LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

Customers.CompanyName

Customers.CustomerID

Orders.CustomerID

Orders.OrderID

Orders.OrderDate

Paris spécialités

PARIS

     

Centro comercial Moctezuma

CENTC

     

FISSA Fabrica Inter. Salchichas S.A.

FISSA

     

Wartian Herkku

WARTH

Wartian Herkku

10270

01-Sep-94

Split Rail Beer & Ale

SPLIR

Split Rail Beer & Ale

10271

01-Sep-94

Rattlesnake Canyon Grocery

RATTC

Rattlesnake Canyon Grocery

10272

02-Sep-94

In this example, the Customers table is outer joined with the Orders table to show all customers regardless of whether they have placed an order. The first three customers listed have not placed orders. The "fill in the blank with key propagation" rule states that the user can add an order for this customer by filling in the Orders.OrderDate field (and any other order information except the customer's ID). The value in Customers.CustomerID is automatically propagated into Orders.CustomerID. In this update scenario, the child key (Orders.CustomerID) is read-only. The act of "filling in the blank" associates the new record in the Orders table with the customer.

Updatability Restrictions for Multiple-Table Queries

To be fully updatable, a query must meet several requirements:

  • You must specify an explicit inner or outer join between tables. Joins created implicitly in the WHERE clause of the SELECT statement aren't updatable. For example, the following join isn't updatable:

    SELECT  Products.ProductID,  Products.ProductName,  Categories.CategoryID,  Categories.CategoryName FROM Categories, Products WHERE Products.CategoryID = Categories.CategoryID;
  • Summary (GROUP BY), UNION, DISTINCT, and crosstab queries are never updatable. Queries joined to one or more summary queries aren't updatable, even if you don't attempt to modify fields from an otherwise updatable table. However, a query may be updatable if it refers to a summary query in a sub-SELECT statement, as in the following example:

    SELECT Orders.* FROM Orders WHERE Orders.Freight > (SELECT Avg(Orders.Freight) AS AvgOfFreight FROM Orders;);

    In this case, fields from the Orders table are updatable.

  • To be able to insert new records into a table in any query, all primary key fields must be present.

  • While you're updating a single record in a query, changes to certain fields may render certain other fields nonupdatable until the edit to the record is either saved or canceled. As soon as the user edits data on the "one" side of a query, the join key on the "many" side can no longer be modified. Usually, the "many" side's join key is updatable. However, because data on the "one" side was modified first, this field is temporarily rendered unmodifiable because row fix-up would discard changes to the "one" side's data. As soon as the change to the "one" side of the query is saved or canceled, the "many" side's join key becomes updatable again.

  • A change to a multiple-table query must not create orphaned records. You can change the join key in the "many" table to a value already present in the "one" table, but you can't specify a nonexistent value, except in the case of outer joins.

Data Manipulation Language SQL Statements

So far, only selecting, sorting, and summarizing data using various forms of the SQL SELECT clause have been covered. Microsoft Jet also uses SQL to modify a database. You can insert new records, delete existing records, and update values in existing records using syntax that's similar to that used with the SELECT statement.

Data-manipulation queries can either be created as permanent QueryDef objects in the database, or executed directly in Visual Basic code by using the Execute method of a database object variable.

The single-table and multiple-table query updatability rules for result sets created from SQL SELECT statements that are discussed in the previous section also apply to updates done with the data manipulation language (DML) statements discussed in this section.

The UPDATE Query

You can use the UPDATE query to specify a set of records from one or more tables and assign new values to one or more fields in that result set. The new value can be either a literal value assigned to each record to be updated, or an expression that uses existing values as part of the new value.

The following example updates the Salary field for all sales managers:

UPDATE Employees  SET Employees.Salary = 50000 WHERE Employees.Title = 'Sales Manager';

Every employee record that meets the selection criterion is assigned the same value of 50,000.

The next example applies a formula that increases the employees' salaries by 10 percent for all employees who aren't sales managers:

UPDATE Employees  SET Employees.Salary = Employees.Salary*1.1 WHERE Employees.Title <> 'Sales Manager';

The Salary field value for each record in the result set produced by applying the WHERE clause is multiplied by the constant value 1.1 and then stored in the same Salary field.

Multiple-Table UPDATE Queries

In addition to simple single-table updates, with Microsoft Jet you can create an UPDATE query that's the result of a multiple-table join. The following example joins the Orders and Order Details tables. It uses a selection criterion from the Orders table, but updates values in the Order Details table:

UPDATE Orders  INNER JOIN [Order Details]  ON Orders.OrderID = [Order Details].OrderID  SET [Order Details].Discount = 0.25 WHERE Orders.CustomerID = 'FRANS';

In this case, the fields in the table on the "many" side of a one-to-many relationship are being updated. Fields on the "one" side of the join can also be updated, with certain limitations:

UPDATE Orders  INNER JOIN [Order Details]  ON Orders.OrderID = [Order Details].OrderID  SET [Order Details].Discount = 0.25,  Orders.OrderDate = #6/16/95# WHERE Orders.CustomerID = 'FRANS';

You can't create a multiple-table UPDATE query if one of the tables in the join is a summary query — a query that uses the GROUP BY clause, for example.

See Also For more information about UPDATE queries, see the "Selecting Unique Records" section earlier in this chapter.

The DELETE Query

The DELETE query uses the same join and selection syntax as the SELECT and UPDATE queries, but instead of retrieving records or updating the retrieved records, it deletes the selected records from the table or tables in the query. The simplest form of DELETE query deletes, from a single table, records that match a selection criterion. For example:

DELETE * FROM Orders WHERE Orders.OrderDate < #10/1/94#;
Multiple-Table DELETE Queries

A DELETE query can also be based on a multiple-table join. For example, the following sample query joins the Categories table to the Products table, and deletes records in the Products table that match the selection criterion specified for the Categories table:

DELETE Products.* FROM Categories  INNER JOIN Products  ON Categories.CategoryID = Products.CategoryID WHERE Categories.CategoryName = 'Beverages';

See Also For more information about DELETE queries, see the "Selecting Unique Records" section earlier in this chapter.

The Append Query (INSERT INTO)

You can append new records to an existing table by using an INSERT INTO query. In the simplest case, literal values are assigned to a single record that's inserted into the table:

INSERT INTO PerformanceGrade (PerformanceGradeKey, LowRange, HighRange) VALUES (5, 500000.01, 600000);
INSERT INTO…VALUES

The INSERT INTO query lists the table to which the record will be added, the fields that will be populated with new values, and a corresponding VALUES clause with the literal values that will be assigned to those fields. The number and data types of the fields in the field list and the value list must match.

It's not necessary to specify every field in the table. Any fields not specified are filled with the value specified by the DefaultValue property for that particular field. However, all required fields must be assigned, including those fields participating in indexes or acting as part of the primary key.

However, an INSERT INTO query can't violate any referential integrity constraints or ValidationRule property clauses established at either the field level or the table level. If the Required property is set to True for a particular field, for example, and you don't assign a value for this field, a trappable run-time error is generated when you attempt to run the INSERT INTO query.

Similarly, if a referential integrity constraint has been established that requires a matching value in a foreign key field and that value doesn't exist in the related table, the INSERT INTO query will fail.

INSERT INTO…SELECT

An alternative form of this statement (and one likely to be created if you generate this query in Microsoft Access) is as follows:

INSERT INTO PerformanceGrade (PerformanceGradeKey, LowRange, HighRange) SELECT 5 AS Expr1, 500000.01 AS Expr2, 600000 AS Expr3;

Rather than using the VALUES clause to assign field values, this form of the INSERT INTO statement takes its values from another SELECT statement. In this particular case, because the inner SELECT statement doesn't contain a FROM clause, a single record is inserted into the new table, using the calculated fields Expr1, Expr2, and Expr3 as the source items.

When the source SELECT statement does have a FROM clause, all records retrieved by the SELECT statement are inserted into the first table:

INSERT INTO Employees  (EmployeeID,  LastName,  FirstName,  Title,  TitleOfCourtesy,  BirthDate,  HireDate,  Address,  City,  Region,  PostalCode,  Country,  HomePhone,  Extension,  Photo,  Notes,  ReportsTo) SELECT  TemporaryEmployees.EmployeeID,  TemporaryEmployees.LastName,  TemporaryEmployees.FirstName,  TemporaryEmployees.Title,  TemporaryEmployees.TitleOfCourtesy,  TemporaryEmployees.BirthDate,  TemporaryEmployees.HireDate,  TemporaryEmployees.Address,  TemporaryEmployees.City,  TemporaryEmployees.Region,  TemporaryEmployees.PostalCode,  TemporaryEmployees.Country,  TemporaryEmployees.HomePhone,  TemporaryEmployees.Extension,  TemporaryEmployees.Photo,  TemporaryEmployees.Notes,  TemporaryEmployees.ReportsTo FROM TemporaryEmployees WHERE TemporaryEmployees.Title = 'New Hire';

Instead of itemizing the fields in the source and target tables, you can use the shorthand form of this statement:

INSERT INTO Employees SELECT TemporaryEmployees.* FROM TemporaryEmployees WHERE TemporaryEmployees.Title = 'New Hire';

This format requires that the field names in the source and the target tables match.

The Make-Table Query (SELECT…INTO)

With the SELECT…INTO statement, you can simultaneously select fields from one or more tables and create a new table with the result of that selection. The following table selects records from the Products table, in which the Discontinued field is set to True, builds a new table called DiscontinuedProducts, and inserts the selected records into the new table:

SELECT Products.*  INTO DiscontinuedProducts FROM Products WHERE Products.Discontinued = True;

If the table specified with the INTO clause already exists, a trappable run-time error occurs.

Only the names and the data types of the original fields are created in the new table. None of the values for DefaultValue, ValidationRule, Caption properties, and so on, are transferred. The new table is built with no indexes and no referential integrity constraints.

An alternative to this approach is to create the empty table, either through Data Access Objects (DAO) methods or by creating a SELECT…INTO query that returns no records. Then create any indexes you want using SQL ALTER TABLE statements or by using the DAO methods discussed in Chapter 3, "Data Definition and Integrity." Finally, populate the new table with an append query by using INSERT INTO…SELECT rather than SELECT…INTO. Note that deferring creation of the indexes until after the table is populated speeds up the insert operation.

Other Query Types

Microsoft Jet supports two other query types. Data definition language (DDL) queries alter the structure of a Microsoft Jet database, and pass-through queries communicate with back-end ODBC database management systems.

DDL Queries

You use DDL queries to modify the structure of a Microsoft Jet database. They can be used to build and delete tables, to add and delete fields and indexes from those tables, and to create referential integrity relationships between two tables.

The following is an example of a DDL QueryDef object that creates a new table with three fields and builds a primary key index on the Auto_ID field:

CREATE TABLE AUTOS (Auto_ID TEXT (10) CONSTRAINT PrimaryKey PRIMARY KEY, Make TEXT(20), Model TEXT(20));

See Also For specific types of and uses for Microsoft Jet DDL queries, see Chapter 3, "Data Definition and Integrity."

Pass-Through Queries

Microsoft Jet is designed to work transparently, for the most part, with a wide variety of data sources, whether they're native Microsoft Jet databases, installable ISAM data sources such as Microsoft FoxPro or Paradox, or ODBC data sources such as SQL Server. Microsoft Jet supports the concept of linked tables — tables that physically reside in an external database, but are treated as though they're local. When you design a SELECT or UPDATE query, you generally don't have to worry about whether the tables in the query are native Microsoft Jet tables or are in another data format.

Many data sources have no native SQL interface. Microsoft Jet translates your SQL statement into direct data-manipulation functions that are specific to that data source. Other data sources, such as SQL Server, support the use of SQL by definition, although the dialect of SQL can differ radically from one database management system to another. Even when you create an SQL statement and save it in a QueryDef object, Microsoft Jet transforms the "native" Microsoft Jet SQL statement into a format that you may barely recognize.

Occasionally, you may want to get Microsoft Jet out of the way and directly communicate with your back-end SQL database system. You can use a pass-through query to send any SQL statement to your server. Microsoft Jet does no translation or error checking of any kind: It simply passes the text of the query on to the back-end server for processing.

To create a pass-through query, you create a QueryDef object and set its Connect property to a connection string that contains information about an ODBC data source.

Your SQL statement must conform to the rules of the server you're using. For example, while Microsoft Jet uses the asterisk (*) character with the Like operator, SQL Server uses the ANSI-standard percent (%) character:

SELECT *  FROM Products  WHERE ProductName Like 'M%'

This query acts like a normal SELECT query. It returns records and can be used to create a Recordset object, but it isn't updatable.

If you save this pass-through query as a permanent QueryDef object, it can even be used to join to other tables or queries. For example, if you save this pass-through query with the name SQLPassThrough, it can be used to build another query, as follows:

SELECT  SQLPassThrough.ProductID,  SQLPassThrough.ProductName,  SQLPassThrough.SupplierID,  SQLPassThrough.CategoryID,  [Order Details].OrderID FROM SQLPassThrough, [Order Details];

In this example, the pass-through query SQLPassThrough is executed on the server, which retrieves all records from the Products table whose ProductName field begins with an "M." The result of the SQLPassThrough query is joined with the Order Details table, and the final result is created transparently.

Anything that the back-end server can interpret can be successfully used in a pass-through query. This includes DDL statements that create and delete tables, or DML DELETE and INSERT INTO statements that don't return records but instead act directly on the back-end server database.

The following code creates a QueryDef object for a pass-through query and sets the QueryDef object's Connect property. In this example, strDbPath is the path to the database, strQueryName is the name for the new query, strSQL is the SQL statement that defines the QueryDef object, and strConnect is the connection string information used to connect to the ODBC data source:

Dim dbs As Database, qdf As QueryDef  Set dbs = OpenDatabase(strDbPath) ' Create new QueryDef object. Set qdf = dbs.CreateQueryDef(strQueryName, strSQL) ' Setting Connect property indicates that query ' is a pass-through query. qdf.Connect = strConnect

An SQL pass-through query may return records to the client computer, or it may perform an operation on the server and not return any records to the client. For example, a SELECT query returns records, while an action query performs an operation on the server and does not return any records. The ReturnsRecords property indicates whether the query is expected to return records. By default Microsoft Jet assumes that a pass-through query will return records, and the ReturnsRecords property is set to True. If a pass-through query will not return records but simply performs an operation on the server, you must set the ReturnsRecords property to False. If you try to run an action query whose ReturnsRecords property is set to True, an error occurs.

The contents of a pass-through query don't have to be an SQL statement. You can execute stored procedures. If a stored procedure returns records, the pass-through query that executes it can be used to create a Recordset object. For example, if this query were saved as a pass-through QueryDef object called SQLPassThroughStoredProc, it could be used in another query:

SELECT SQLPassThroughStoredProc.* FROM SQLPassThroughStoredProc WHERE SQLPassThroughStoredProc.Status = 'sleeping';

You can supply parameters to a stored procedure by concatenating the value of the parameter with the call to the stored procedure. The following example creates a simple SQL Server stored procedure with an input parameter and a return value. It then runs the procedure and retrieves the return value. In this example, strSQL is the remote query definition, cnn is a variable that represents an ODBCDirect Connection object, qdf is the QueryDef object, and var is a variant that contains the value of the parameter for the QueryDef object:

' Create stored procedure on the server. strSQL = "CREATE PROCEDURE UpdateEmps (@invar int) AS RETURN @invar;" cnn.Execute strSQL  ' Create QueryDef object to run stored procedure. Set qdf = cnn.CreateQueryDef("qry", "{ ? = call UpdateEmps(?) }")  ' Handle parameters. qdf.Parameters(0).Direction = dbParamReturnValue qdf.Parameters(1) = 10 qdf.Execute  ' Get return value. var = qdf.Parameters(0).Value

See Also For more information about ODBC, see Chapter 12, "ODBC Desktop Database Drivers."

MS Jet Query Engine Overview

The Microsoft Jet database engine contains a sophisticated query processor that's responsible for query execution. There is, however, much more to the query engine than just a query processor. The query engine also contains a sophisticated query optimizer that takes any given query as its input and determines the fastest join strategy for executing that query.

A query may be internally optimized to yield a more efficient join strategy, but this is all done without intervention from the user. The goal of the query engine is to take any query, no matter how poorly structured, and execute that query in the most efficient manner.

Query Engine Components

The query engine has eight major components:

  • Parser/Binder The parser/binder parses the SQL statement that defines a query and binds the names referenced in the query to fields in the underlying tables. The parser/binder first checks the query for proper syntax. If there are any syntactical problems with the query, an error is returned. Next, the parser/binder converts the SQL string into the internal query-object definition format in which queries are stored.

  • Folder The folder moves expressions "up" the query tree. The top of the query tree, or root, is the final result set. By moving expressions to the top of the tree, the expressions don't have to be computed against records that may be discarded.

  • Splitter The splitter splits conglomerate QueryDef objects into their discrete components. The conglomerate query objects are then removed from the tree. This operation is performed on the query tree starting from the bottom and moving in reverse order up to the root.

  • Flattener The flattener takes the output of the splitter and combines joins, ORDER BY clauses, and WHERE clauses as much as possible.

  • Optimizer The optimizer is one of the most complex components of the query engine. It uses statistics to determine the most efficient way to execute a query. The optimizer in the Jet database engine is a cost-based optimizer, which means the optimizer assigns a cost to each task and then chooses the least expensive list of tasks to perform that will generate the desired result set. The longer a task takes to perform, the more costly or expensive it is.

    The algorithms that the optimizer uses depend on the accuracy of the statistics provided by the underlying engine. For example, the statistics that some ODBC drivers return may not be accurate. This can cause the optimizer to choose a less-than-optimal execution plan. However, if the whole query is sent to the ODBC server for processing (the usual case), the optimizer's execution plan is irrelevant.

    In the native Jet database engine, statistics can become out of date over time. Statistics become out of date if transactions are performed and then rolled back, or if your machine is turned off before the database is closed. The problem in the latter situation occurs because the statistics are cached in memory, and shutting off the machine without closing the database doesn't allow the statistics to be written out to disk.

    To update the statistics in a Microsoft Jet database, you must compact the database. Compacting the database may also speed up queries because the process writes all the data in tables in contiguous pages, which makes scanning sequential pages much faster than when the database is fragmented.

  • Remote Post-Processor The remote post-processor determines how much of a query can be sent to an ODBC back-end for processing by the server. The goal is to send as much of the query as possible to the server for processing. The remote post-processor walks through the query tree from the bottom up, marking subtrees that can be sent to the server. As subtrees are found, they are marked as "remoteable." Finally, SQL strings are generated for the remote queries.

  • Post-Processor The post processor takes the query in its current compiled form and moves it to a new, cleaner, smaller data-execution structure.

  • Join Processor The join processor executes the compiled query. It uses the join strategy that was chosen by the optimizer to return the desired result set.

Join Strategies

Microsoft Jet can choose from five join strategies. Each of these join strategies serves a particular purpose and ranges in complexity from a "brute force" table scan to more complex execution plans. The five join strategies are nested iteration, index, lookup, merge, and index-merge.

Nested Iteration Join

The nested iteration join is a brute force method and is used only as a final attempt at performing a join. A nested iteration is performed only when the tables contain few records and probably no useful indexes.

Algorithm for Performing a Nested Iteration Join
  1. Pair each record in the outer table with a record in the inner table.

  2. Check each pair to make sure it meets the join restrictions.

    If a record matches the restriction, it's kept for the result set; otherwise, it's discarded.

Following is an example of a nested iteration join:

SELECT  Categories.CategoryName,  Products.ProductName,  Categories.Description,  Categories.Picture,  Products.ProductID,  Products.QuantityPerUnit,  Products.UnitPrice, Products.Discontinued FROM Categories  INNER JOIN Products  ON Categories.CategoryID = Products.CategoryID WHERE Products.Discontinued = False ORDER BY Categories.CategoryName, Products.ProductName;
Index Join

Microsoft Jet chooses an index join when the inner table has an index on the joined field. The index join tends to be used when the outer table is small or data doesn't have to be retrieved from the inner table. Index joins are often used for recordsets based on a table.

Algorithm for Performing an Index Join
  1. Retrieve a value from the outer table.

  2. Create a primary key for the inner table.

  3. Search on the inner table to find matches, based on the key created in step 2.

  4. Check the remaining restrictions for each match.

Following is an example of an index join:

SELECT  Products.ProductID,  Orders.OrderDate,  Sum(CLng([Order Details].UnitPrice*Quantity*(1 -Discount)*100))/100  AS ProductAmount FROM Orders  INNER JOIN (Products  INNER JOIN [Order Details]  ON Products.ProductID = [Order Details].ProductID)  ON Orders.OrderID = [Order Details].OrderID GROUP BY Products.ProductID, Orders.OrderDate;
Lookup Join

A lookup join is similar to an index join. The only difference is that the lookup join reduces the number of fields in the inner relationship and performs a sort as part of the index creation on the inner table before performing the join. A lookup join is often used for static queries and ordered dynaset-type Recordset objects.

Algorithm for Performing a Lookup Join
  1. Create a sorted and indexed temporary table, selecting the necessary fields from the inner table.

  2. Retrieve a value from the outer table.

  3. Create a primary key for the inner table.

  4. Search on the temporary table to find matches, based on the key created in step 3.

  5. Check the remaining restrictions for each match.

Merge Join

Merge joins tend to be performed when the input tables contain a large number of records. Merge joins also tend to be used in cases when ordering the output of a join can save resorting.

Index-Merge Join

An index-merge join can be used when each input is a table in native Microsoft Jet database format. Each input must have an index on its join field, and at least one of the indexes must not allow Null values if there is more than one join field.

Algorithm for Performing an Index-Merge Join
  1. In the current record, move to the next record where the foreign key is greater than or equal to the primary key.

  2. If equal, return the record.

  3. If greater, advance the outer input one record.

  4. Move backward the number of matches for the previous key on the inner input if the outer input reference isn't unique and matches the current inner input.

Following is an example of an index-merge join:

SELECT  [Order Details].OrderID,  Products.ProductName,  [Order Details].ProductID,  [Order Details].UnitPrice FROM Products  INNER JOIN [Order Details]  ON Products.ProductID = [Order Details].ProductID;

Query Optimization

This section discusses the techniques for maximizing database performance by using the Jet database engine. It covers the Microsoft Jet query engine and how it optimizes queries, how to take advantage of Rushmore query optimization, and how to avoid common pitfalls in query design that can lead to performance degradation.

See Also For information about optimizing access to ODBC data sources, see Chapter 8, "Accessing External Data," and Chapter 9, "Developing Client/Server Applications."

Because the purpose of most database applications is to retrieve the data stored in the database, QueryDef objects are usually used more than any other object in a database. Queries are useful in Microsoft Jet-based applications because of the powerful query functionality of Microsoft Jet, such as query updatability, hybrid queries (queries based on more than one data source), and stacked queries (queries that use other queries as input).

The MS Jet Query Engine Optimizer

As previously mentioned, the optimizer uses statistics on the tables in the query to determine which join strategy to use. This section looks more closely into the statistics that are used, as well as how the joins themselves affect join strategy selection.

Optimizer Statistics

The optimizer uses two sets of statistics when determining a join strategy on the base tables and on non-base-table inputs such as other QueryDef objects.

For base-table inputs, the optimizer looks at:

  • The number of records in the base table. This tells the optimizer how large the tables are, which affects which join strategy is used.

  • The number of data pages in the base table. The more data pages that need to be read from disk, the more costly the query is.

  • The location of the table. Is the table in a local ISAM format or is it located in an ODBC database? Each distinct ODBC server is given a number. This helps the optimizer and remote post-processor do their work.

    The indexes on the table. When looking at indexes, the optimizer is concerned with:

    • Selectivity. If a particular index returns only three matching values out of 100,000, it's considered highly selective. A unique index is the most highly selective index available, because every value is distinct. It's always best to use highly selective indexes.

    • Number of index pages. As with data pages, the more index pages that must be read from disk, the more costly the query is.

    • Whether Null values are allowed in the index. Null values in an index may rule out the use of an index-merge join.

    • Whether duplicates are allowed in the index. Duplicates in an index affect its selectivity.

For non-base-table inputs, the optimizer looks at:

  • Record counts. As with base tables, the number of records in the input query affects which join strategy is chosen.

  • Input costs. Because the input is another query, the cost of executing that query must be factored into the total cost.

    Note The record count and input costs are estimates.

  • Location of inputs. Each distinct ODBC server is given a number. This helps the remote post-processor do its work.

  • Record ordering. An ordered input may be a candidate for a lookup join.

Choosing Join Combinations

Using the statistics described, the optimizer chooses the best join strategy. Because Microsoft Jet uses a cost-based optimizer, each possible join combination is examined to determine which will yield the least costly query execution.

In choosing a join combination, the optimizer first selects a base-table access strategy. The optimizer then stores the estimated number of records returned and a cost — how expensive it is to read the table. The optimizer then generates all combinations of pairs of tables and determines the cost of each join strategy. Finally, the optimizer adds tables to the joins and continues to calculate statistics until the cheapest strategy for the entire query is found.

Choosing a Base-Table Access Strategy

Three methods are available for accessing a base table:

  • Table scan. This is generally the slowest method because every data page must be read. When a table scan is used, single-table restrictions are checked for each record in the table, but no base-table page is read from disk more than once.

  • Index range. When an index range is used, a table is opened with a particular index on one of the single-table restrictions (or selection criteria). Records in the index are then checked against the remaining restrictions. A base-table page can be read from disk more than once.

  • Rushmore restriction. A Rushmore restriction is used when there are restrictions on multiple-indexed fields. By using multiple indexes to resolve the query, the number of base-table pages that need to be read from disk is minimized. For more information, see the following section.

Rushmore Query Optimization

One of the most powerful ways to execute queries is to take advantage of Rushmore query optimization.

What Is Rushmore?

Simply put, Rushmore query optimization uses indexes efficiently to quickly find a set of records. Rushmore query optimization is used on queries that involve restrictions on multiple-indexed fields.

Operations That Can Use Rushmore
Index Intersection

Index Intersection involves scanning multiple indexes for records matching criteria such as:

field1 = 'expression' AND field2 = 'expression'

As described in the previous section, Rushmore uses the indexes on both fields to find matching records for each part of the criteria. The results from each of the indexes are intersected to find the records that match both criteria.

For example, suppose you have a table that contains demographic information about all the people in New York City. The table contains information regarding each person's age, height, weight, hair color, and eye color. Suppose you write the following query to find all the people with blonde hair and blue eyes:

SELECT *  FROM People WHERE hair_color = 'blonde' AND eye_color = 'blue'

Assuming indexes have been created on the hair_color and eye_color fields, this query would be solved using Rushmore Index Intersection. The index on hair_color is used to find all the blonde-haired people, and then the index on eye_color is used to find all the blue-eyed people. The results of each index search are then intersected to find all the people with both characteristics.

Index Union

Index Union involves scanning multiple indexes for records that match criteria such as:

field1 = 'expression' OR field2 = 'expression'

Again, Rushmore uses the indexes on both fields to find the matching records. The resulting records from each of the indexes are then combined to find the records that match either criterion.

For example, again suppose you have a table that contains demographic information about all the people in New York City. Suppose you write a query to find all the people with black hair or green eyes. The criteria would look like:

SELECT *  FROM People WHERE hair_color = 'black' OR eye_color = 'green'

Using Rushmore Index Union, the index on hair_color is used to find all the black-haired people, and the index on eye_color is used to find all the green-eyed people. The result of each index search is then combined to find all the people with either characteristic.

Common Pitfalls

Many common mistakes can cause unnecessary bottlenecks when you are running queries. Following is a list of these common mistakes and what to do to correct them.

  • Including expressions in query output. Placing expressions, such as an IIf on an output field of a query can cause optimization problems if the query is used as the input to another query. For example, consider the following two queries, where the first query is saved as Q1:

    SELECT IIf(ShipVia = 2, 'United Package', 'Other')  AS ShipperName  FROM Orders;  SELECT *  FROM Q1  WHERE ShipperName = 'United Package';

    Because Microsoft Jet can't optimize the IIf expression in the first query, it can't optimize the second query. Expressions can get buried so far down in a query tree that it's easy to forget about them. If expressions are necessary in the output, try to place the expression in a control on a form or report.

    The following query replaces the previous two queries and is the optimal way to write this query:

    SELECT *  FROM Orders  WHERE ShipVia = 2;
  • Using GROUP BY on too many fields. When you're creating a totals query, use the GROUP BY clause on as few fields as necessary to achieve the query's goal. The more fields in the GROUP BY clause, the longer the query takes to execute. Make sure you don't include extraneous fields.

  • Using GROUP BY in same query as a join. If possible, place a GROUP BY clause on a table and then join it to another table, rather than joining the two tables and using the GROUP BY clause in the same query as the join. For example, instead of this query

    SELECT  Orders.CustomerID,  Count(Orders.OrderID) AS CountOfOrderID FROM Customers  INNER JOIN Orders  ON Customers.CustomerID = Orders.CustomerID GROUP BY Orders.CustomerID;

    you may benefit by using two separate queries, as follows, where the first query is saved as SelectWithGroupByQ1:

    SELECT Customers.CustomerID  FROM Customers GROUP BY Customers.CustomerID;  SELECT  Orders.CustomerID,  Count(Orders.OrderID) AS CountOfOrderID FROM SelectWithGroupByQ1, INNER JOIN Orders ON Q1.CustomerID = Orders.CustomerID GROUP BY Orders.CustomerID;
  • Not including indexes on join fields. When you're joining tables, always try to index the fields on both sides of a join. This can speed up query execution by allowing more sophisticated join strategies such as index and index-merge joins. The use of indexes also provides better statistics. For more information, see the "Join Strategies" section earlier in this chapter.

  • Under-indexing fields. In appropriate circumstances, for example where the database is used solely for decision-support types of applications, you can place an index on all fields that have unique values and are used in a join or as a restriction. By using Rushmore query optimization, Microsoft Jet can take advantage of multiple indexes on a single table. This makes indexing many fields advantageous. However, keep in mind that adding indexes to fields can cause performance to suffer.

  • Using Count(fieldname) instead of Count(*). When you have to determine the number of records, you should use Count(*) rather than Count(fieldname) because there are Rushmore optimizations that allow Count(*) to be executed much more quickly than Count(fieldname).