Data Points

Column Expressions, DataRelations, and Computations

John Papa

Code download available at: Data Points 2007_01.exe(191 KB)

ADO.NET does more than simply allow you to retrieve and save data to and from a database. It provides myriad features that allow you to manipulate data both for business logic and for display purposes in real-world applications. As demonstrated by the examples in this column, ADO.NET is not just a tool for data access, it can is also be used for data manipulation.

In this month's column, I will concentrate on some of the questions I am asked most often regarding data manipulation with ADO.NET. I'll discuss how you can use expression-based columns and DataRelations to achieve certain goals. I'll also explain how some other features, such as the DataTable's Compute method and the SetOrdinal method, can be used to address common business needs.

Q I want to display order information individually and in rolled-up groups (by day, by month, and by year) using DataTables. I need to allow the order information to be edited, and the totals in the parent DataTables must reflect the changes immediately. I want to show each of the sets of data in separate grids on the form. After I load the data in a DataTable, how do I create the grouped information with the sum of the orders for each group using ADO.NET DataColumn expressions?

Q I want to display order information individually and in rolled-up groups (by day, by month, and by year) using DataTables. I need to allow the order information to be edited, and the totals in the parent DataTables must reflect the changes immediately. I want to show each of the sets of data in separate grids on the form. After I load the data in a DataTable, how do I create the grouped information with the sum of the orders for each group using ADO.NET DataColumn expressions?

A Calculated fields can be created easily by either using a calculated expression in a SQL statement or creating an expression-bound DataColumn. There are differences in how these two approaches operate and the features they expose.

A Calculated fields can be created easily by either using a calculated expression in a SQL statement or creating an expression-bound DataColumn. There are differences in how these two approaches operate and the features they expose.

For your scenario and requirements, calculated columns in a SQL statement is not the most efficient solution. A good solution is to add expression-based DataColumns to a series of DataTables.

Before diving too deep into this problem, let's take a look at the end result we're striving for (see Figure 1). This sample shows five grids on a form. The topmost grid contains the individual order line items. The price, quantity, and discount of each order line can be edited in this grid. When these values are edited, the extended price is calculated for that row and the values of the totals in the other four grids automatically recalculate and display the updated total to reflect the changes. The second grid contains a row for each order, including a calculated column that shows the sum of the extended price for that order. The third, fourth, and fifth grids show the sum of each order by day, month, and year, respectively.

Figure 1 Order Total by Day, Month, and Year

Figure 1** Order Total by Day, Month, and Year **(Click the image for a larger view)

Each grid is bound to a different DataTable. The first DataTable is loaded with the order lines from the database and has an expression-based column appended to it. After the data is loaded into the DataTable using the DataAdapter's Fill method, a column named ExtendedPrice of type decimal is added to the DataTable. This column is defined as an expression column that calculates its value based on the values of the UnitPrice, Quantity, and Discount for each respective DataRow:

ds.Tables["OrderLines"].Columns.Add("ExtendedPrice", 
    typeof(decimal), "(UnitPrice * Quantity) * (1 - Discount)");

The OrderLines DataTable contains the core data for this form. Once it is loaded and its expression column created, the first grouping DataTable can be filled and added to the DataSet (see Figure 2). The SQL statement grabs the OrderID and OrderDate for each order and then the DataTable is filled with the results. Before adding the OrderTotal column to this DataTable, a DataRelation must first be created so the expression in this Orders DataTable can access columns in the OrderLines DataTable.

Figure 2 An Expression Column that Sums

using(SqlCommand cmd = new SqlCommand("SELECT o.OrderID, o.OrderDate " + 
    "FROM Orders o WHERE YEAR(o.OrderDate) = @year " +
    "ORDER BY o.OrderDate DESC", cn))
{
    cmd.CommandType = CommandType.Text;
    cmd.Parameters.AddWithValue("@year", year);
    SqlDataAdapter adpt = new SqlDataAdapter(cmd);
    DataTable orders = new DataTable("Orders");
    adpt.Fill(orders);
}
ds.Tables.Add(orders);
ds.Relations.Add("Orders2OrderLines", ds.Tables["Orders"].Columns["OrderID"], ds.Tables["OrderLines"].Columns["OrderID"]);
orders.Columns.Add("OrderTotal", typeof(decimal), 
    "Sum(Child.ExtendedPrice)");

The DataRelation is set up between the OrderID columns of the two DataTables. Then the OrderTotal expression column can be created and added to the Orders DataTable:

Sum(Child.ExtendedPrice) 

Working first inside and then outside the parenthesis, the Child operator tells the OrderTotal DataColumn to navigate down to the child table (the OrderLines DataTable) and get the ExtendedPrice column. Again using the DataRelation, the Sum operator sums the values of the ExtendedPrice columns for the rows in the OrderLines DataTable whose OrderID values match the OrderID of the parent DataTable. The end result is that the total amount for each order is displayed in the OrderTotal column of the Orders DataTable.

Keep in mind that if there is more than one DataRelation, the syntax needs to be altered to indicate the specific DataRelation. The example from Figure 2 would be altered to look like this:

Sum(Child(Orders2OrderLines).ExtendedPrice) 

The same technique can be used to fill the OrdersByDay DataTable, which is loaded with a distinct list of the order dates from the database. Then a DataRelation is established between this DataTable and its OrderDate column and the OrderDate column of the Orders DataTable. This allows the OrdersByDay DataTable to roll up the order totals into an OrderTotalByDay column using expressions, as shown here:

ds.Relations.Add("OrdersByDay2Orders", ds.Tables["OrdersByDay"].Columns["OrderDate"],
 ds.Tables["Orders"].Columns["OrderDate"]);

dayOrders.Columns.Add("OrderTotalByDay", typeof(decimal), 
    "Sum(Child.OrderTotal)");

The final two grids are bound to DataTables that are loaded in the same fashion. This technique could be expanded to show other calculations. For example, you could easily show a grid that contains the order totals for a specific customer as long as the customer information was retrieved in the base table. Using a DataView, you could also filter the grid on an expression column.

Q I have a parent DataTable (Orders) and a child DataTable (Order Lines). The parent DataTable has an expression column for OrderTotal. I need to display some columns from the parent table in the child table for each row. How can I grab values from a parent

Q I have a parent DataTable (Orders) and a child DataTable (Order Lines). The parent DataTable has an expression column for OrderTotal. I need to display some columns from the parent table in the child table for each row. How can I grab values from a parent

A Expression-based columns can use the Child operator to access columns in a child DataTable. However, they can also use the Parent operator to access columns in a parent DataTable. For example, to access the value of a column in the parent table, you can modify the code from the previous example to add the following code snippet:

ds.Tables["OrderLines"].Columns.Add("OrderTotalForOrderLinesTable", 
    typeof(decimal), "Parent.OrderTotal");

A Expression-based columns can use the Child operator to access columns in a child DataTable. However, they can also use the Parent operator to access columns in a parent DataTable. For example, to access the value of a column in the parent table, you can modify the code from the previous example to add the following code snippet:

ds.Tables["OrderLines"].Columns.Add("OrderTotalForOrderLinesTable", 
    typeof(decimal), "Parent.OrderTotal");

This code adds an expression column to the child table, named OrderLines, that grabs the OrderTotal value from the Orders DataTable. This new column must be created after the child and parent DataTables have been created, the DataRelation between the two has been established, and the OrderTotal column has been created. This is a common technique to use when creating a DataView on a child table that will be bound to a grid.

When you're using the Parent operator with expression columns, you are duplicating the parent data to the child DataTable. Since the tables are synchronized, if someone changes the value in the parent DataTable, then the value in the child table will also display the change.

Q I have a DataGridView that is bound to a DataTable's DefaultView. The source of the DataTable is a stored procedure that retrieves data from my application's database. All of the fields are returned in a specific order, but I need to display these fields in a different order in my DataGridView. How can I easily reorder the columns in the DataGridView?

Q I have a DataGridView that is bound to a DataTable's DefaultView. The source of the DataTable is a stored procedure that retrieves data from my application's database. All of the fields are returned in a specific order, but I need to display these fields in a different order in my DataGridView. How can I easily reorder the columns in the DataGridView?

A Sometimes you will find it is necessary to change the order of columns in a DataTable when it isn't practical to change the sequence of the fields in the SQL statement. The DataColumn exposes a SetOrdinal method that allows you to alter the sequence of the columns in a DataTable.

A Sometimes you will find it is necessary to change the order of columns in a DataTable when it isn't practical to change the sequence of the fields in the SQL statement. The DataColumn exposes a SetOrdinal method that allows you to alter the sequence of the columns in a DataTable.

The ordinal positions are 0-based, so to move the first column to be the fourth displayed, you would use the following code:

myDataTable.Columns[0].SetOrdinal(3);

If you prefer, you can reference the column by its name rather than its ordinal position:

myDataTable.Columns["OrderID"].SetOrdinal(3);

The columns before the third ordinal position are all decremented one slot to make room for the change of positioning.

Q I have a screen that displays a list of customers. The user can select a specific customer from this list and all of the customer data is displayed. The user can then select another customer from the list and view that customer's information. I can either load all of the customer data into a DataTable once and thus only hit the database one time, or I can grab just the list of customers and their IDs and then get the customer information from the database once the user selects a customer. Which of these methods is better?

Q I have a screen that displays a list of customers. The user can select a specific customer from this list and all of the customer data is displayed. The user can then select another customer from the list and view that customer's information. I can either load all of the customer data into a DataTable once and thus only hit the database one time, or I can grab just the list of customers and their IDs and then get the customer information from the database once the user selects a customer. Which of these methods is better?

A This is a common question. Should you take one big hit or several little ones? If you grab a list of customers and all of their information from the database and store the results in memory in a DataSet, you are retrieving a lot of information that may never be used. However, you are potentially reducing trips to the database. For example, if there are 1,000 customers in the customer table and you grab 40 columns for each customer, 40,000 values are retrieved and stored in the DataTable. The list of customers, in comparison, only requires a display value (for example, CompanyName) and an ID (for example, CustomerID). The rest of the columns are only shown on screen if the user selects a specific customer from the customer list.

A This is a common question. Should you take one big hit or several little ones? If you grab a list of customers and all of their information from the database and store the results in memory in a DataSet, you are retrieving a lot of information that may never be used. However, you are potentially reducing trips to the database. For example, if there are 1,000 customers in the customer table and you grab 40 columns for each customer, 40,000 values are retrieved and stored in the DataTable. The list of customers, in comparison, only requires a display value (for example, CompanyName) and an ID (for example, CustomerID). The rest of the columns are only shown on screen if the user selects a specific customer from the customer list.

The other option you've described is to load only the list of customers with the CompanyName and CustomerID. Then when the user selects a customer, the application makes another call to the database to retrieve detailed information for the specific customer. This technique loads significantly less data up front, but hits the database more often.

Which solution is preferable depends on additional factors. For example, if your customer data is so small that it can be gathered very quickly with no performance hit under load tests, then gathering all data up front may be a good choice. However, if the data changes quite often, this solution has a major flaw in that the loaded data will become stale very quickly. Therefore, before choosing to load all of the data up front into memory, you should consider the time it takes to load the data, how much of a memory footprint this data will consume, how much data you are actually loading, and whether the data is likely to change in the database while it is loaded (causing your on screen data to become obsolete). I generally prefer to use this technique only for smaller sets of data that do not change behind the scenes.

The second option is to grab each customer's detailed information only when the user requests it. With this technique, the list will generally load faster since it's loading less data. It will also provide fresher data since the only stagnant data is the list of customers. The downside to this technique is that the application hits the database whenever the user selects a customer. Thus, the calls to get the customer details should be optimized since you are getting customer data using a key field (CustomerID). If these calls to retrieve the customer data are causing a noticeable delay, network bandwidth, the query itself, or issues with passing data through the architecture of your application could be the cause of the slowdown. If you decide on this technique and then experience a slowdown, you'll have to determine the origin of the slowdown and if it can be resolved.

In the end, there's no simple answer for these types of decisions. The answer is almost always that it depends on the situation. In general, though, I find that it is better to load the list with minimal data and go back for additional data if the user selects a specific entry. Remember, performance isn't the only issue here. You also have to consider whether the data is fresh or stale.

Q I need to present a DataGridView loaded with sales information bound to a DataTable. My application must allow the user to perform calculations on the sales data in the grid and to choose which rows and columns to include in the calculation. How can I allow calculations on the DataTable without hitting the database again?

Q I need to present a DataGridView loaded with sales information bound to a DataTable. My application must allow the user to perform calculations on the sales data in the grid and to choose which rows and columns to include in the calculation. How can I allow calculations on the DataTable without hitting the database again?

A In this case, the Compute method will do the trick nicely. While expression-based columns operate on a single row at a time, the DataTable's Compute method allows you to perform operations on a set of rows, given a filter and an expression. The filter is used to qualify which rows the Compute method should operate on while the expression indicates the aggregate function expression to perform. For example, you could add a column to the DataTable that calculates the average order total for customers in the USA. This is a great tool to perform calculations on a set of rows quickly without hitting the database again.

A In this case, the Compute method will do the trick nicely. While expression-based columns operate on a single row at a time, the DataTable's Compute method allows you to perform operations on a set of rows, given a filter and an expression. The filter is used to qualify which rows the Compute method should operate on while the expression indicates the aggregate function expression to perform. For example, you could add a column to the DataTable that calculates the average order total for customers in the USA. This is a great tool to perform calculations on a set of rows quickly without hitting the database again.

Figure 3 shows a sample form that displays a list of orders, along with each order's total. This sample contains all the basic elements that the expression and the filter require. On the left in the Filter groupbox, the user can select the field to filter, the operator to apply to the field, and the criteria for the filter. The list of fields is filled by binding to the list of DataColumns:

DataColumn[] colList = new DataColumn[orders.Columns.Count];
orders.Columns.CopyTo(colList, 0);
ddlFilterColumn.DataSource = colList;
ddlFilterColumn.DisplayMember = "ColumnName";

Figure 3 Calculating with the Compute Method

Figure 3** Calculating with the Compute Method **(Click the image for a larger view)

This code creates a DataColumn array, fills it with the columns from the DataTable, and binds it to the ComboBox. By binding the array of DataColumns to the ComboBox, this provides access to other properties of the selected DataColumn that can be used. For example, if the selected column is a string then the value must be wrapped within quotation marks. Since the DataColumn is bound to the ComboBox, the code can figure out whether it should wrap the value in quotes.

The Calculation groupbox contains both a list of aggregate functions and a list of the columns on which the selected aggregate function should operate. The list of columns is bound to the ddlCalculationColumn ComboBox in the same way the ddlFilterColumn ComboBox is bound. The list of aggregate functions is loaded with the seven functions available to the Compute method: Sum, Count, Avg, Min, Max, StDev, and Var.

When the button is clicked, the code shown in Figure 4 executes. It starts by determining the type of the selected column for the filter. If the column is a string or a date, then the criteria value is wrapped with quotation marks. Once the filter and the expression have been formed, they are passed to the Compute method and the result is sent to the textbox.

Figure 4 Performing the Calculation

if (!ValidateControls()) return;

Type colType = ((DataColumn) (ddlFilterColumn.SelectedItem)).DataType;
string filterCriteria = string.Empty;
if (ddlFilterOperator.Text == "Like")
{
    filterCriteria = string.Format("'{0}%'", txtFilterCriteria.Text);
}
else 
{
    filterCriteria = (colType.Name == "String" || 
                      colType.Name == "DateTime") ? 
        string.Format("'{0}'", txtFilterCriteria.Text) : 
                               txtFilterCriteria.Text;
}
string filter = string.Format("{0} {1} {2}", ddlFilterColumn.Text, 
    ddlFilterOperator.Text, filterCriteria);
string expression = string.Format("{0}({1})", ddlAggregate.Text, 
    ddlCalculationColumn.Text);

object value = this.orders.Compute(expression, filter);
txtResult.Text = value.ToString();

Send your questions and comments for John to  mmdata@microsoft.com.

John Papa is a senior .NET Consultant with ASPSOFT (aspsoft.com) and a baseball fanatic who spends most of his summer nights rooting for the Yankees with his family and his faithful dog, Kadi. John, a C# MVP, has authored several books on ADO, XML, and SQL Server, and can often be found speaking at industry conferences, such as VSLive, or blogging at codebetter.com/blogs/john.papa.