Data Points

DataRelations in ADO.NET

John Papa

Contents

Why Use DataRelations?
Join Queries vs. Separate Queries
Data Shaping
Defining DataRelations
Constraints and Cascading
Children and Parents
Conclusion

One of the biggest differences between traditional ADO and ADO.NET is that the rowsets stored within ADO.NET can be truly relational. For example, a DataSet can store one DataTable containing customers and another DataTable containing the customers' orders. These DataTable objects can then be related to one another within ADO.NET, thus recreating the relationship that exists within the relational database. In ADO.NET once you retrieve two rowsets of data (in other words, parents and children) and relate them to each other, you can then retrieve all children rows for a given parent, display any one DataTable in a grid at a time, or modify several tiers of DataTable objects, and send the changes to the database all in one batch update. DataRelation objects, which are integral to ADO.NET applications, enable these features to function.

In this column, I'll explore the pros and cons of representing data in a multitiered relational structure as opposed to a single rowset (from an INNER JOIN) through ADO.NET. I'll explain how DataRelation objects can enhance applications and discuss what decisions need to be made when implementing them. I'll wrap up by demonstrating techniques that create a series of DataTable objects related through DataRelation objects as well as ways to retrieve children and parent rowsets and to cascade updates throughout a DataSet.

Why Use DataRelations?

Before I jump into any code examples, it is important to understand what data relations offer and what considerations you need to make prior to deploying them. Some of the key factors in deciding whether to use data relations and how to use them involve the issue of data redundancy, how the application needs to display data, and how the application needs to manage data changes. I'll begin by addressing these issues and then demonstrate how to implement relations.

In the February 2002 Data Points column in MSDN® Magazine, I demonstrated how to expand an application so that you can save changes throughout multiple data hierarchies using ADO.NET DataRelation objects. The sample that I demonstrated in that column was a standard application that displayed and saved data through a DataGrid control on a WebForm. (The DataGrid control is ideal for exploiting related DataTable objects since it has built-in support for these relations.) In this column I'll focus more on the DataRelation objects and explain the pros and cons of using them versus conventional joined rowsets stemming from SQL statements.

For the examples in this column, I will refer to the Northwind database's customer-to-orders-to-order details hierarchical structure. When considering whether to use a single DataTable containing a joined rowset versus a related series of DataTable objects, there are a few key considerations to keep in mind. The two basic functions of managing data are displaying it and managing changes (both to the database and within the data structure itself).

If you simply want to display information and do not require any data modifications, then the choices are primarily dependent on whether you want to display the data in a single grid or a series of grids. If the data is intended only for display in a single grid, then the best fit might be a joined rowset stored in a DataTable. Just because ADO.NET can store multiple related DataTable objects doesn't mean that developers have to use them every time. Just like using a Phillips head screwdriver is not as effective as using a regular screwdriver in a single-slotted screw, a series of related DataTable objects may not be the most effective solution in a display-only scenario. If your intention is only to have the data displayed, and you would prefer to display the data in a series of grids rather than in a single grid, storing the data in related DataTable objects is more appropriate.

Display-only scenarios are common; however, so are applications that require both display and data modifications. For these types of applications it's crucial to have a balance between the appropriate tool for displaying the data and the ability to manage data modifications. Here is where the flexibility of ADO.NET really comes into play. By storing the customer, order, and order detail data in three separate DataTable objects all related to one another, you can easily display them in a series of grids and manage any data changes that are made to each of them. The DataRelation objects that would relate these rowsets can be filtered prior to sending the changes to the underlying database using the GetChanges method. This way, only the data that has changed is sent to the data source.

Join Queries vs. Separate Queries

Related rowsets using DataTable and DataRelation objects also help eliminate redundant data that is inherent to a single joined rowset. For example, joining customers, orders, and order details into a single rowset duplicates the customer-specific data (for example, customer contact name and company name) for every order and order detail row. For example, if a customer has 5 orders and each order had 5 order detail rows, the customer data would be repeated for all 25 rows. This is old hat to developers who use SQL; however, by using hierarchical data structures such as XML or the ADO.NET related DataTable objects, there are better ways to eliminate this redundant data.

The most common type of rowsets are derived from join queries. For example, to retrieve the customers and their orders, the standard SQL syntax returns a single rowset. This single rowset duplicates the customer data for each customer order, thus creating redundant data. Adding the order details to this join query creates even more redundant data (see Figure 1).

Figure 1 Joining Customers, Orders, and Order Details

SELECT    c.CustomerID,
    c.CompanyName,
    c.ContactName,
    o.OrderID,
    o.OrderDate,
    od.ProductID,
    p.ProductName,
    od.UnitPrice,
    od.Quantity
FROM    Customers c
    INNER JOIN Orders o ON c.CustomerID = o.CustomerID
    INNER JOIN [Order Details] od ON o.OrderID = od.OrderID
    INNER JOIN Products p ON od.ProductID = p.ProductID

Updating a single rowset that is derived from multiple tables is cumbersome if you want to update the parent data. For example, if you want to update the customer data in the customer-to-orders-to-order details rowset, because of the redundant data you'll have to requery the database to make the changes take effect throughout the rowset. Using DataRelation objects in ADO.NET, this rowset can be broken into its base rowsets and related to one another, making the resultant DataSet more representative of its actual structure. By breaking the query down into a relational structure of three DataTable objects related to one another, less redundant data exists. Updating the data becomes much simpler and displaying a customer and their orders is second nature.

Another advantage to using related DataTable objects is an increased flexibility in updating the database. When the data is separated into three DataTable objects of customers, orders, and order details and related to each other with DataRelation objects, each DataTable's data can be updated to the database using commands specific to the DataTable's data. For example, the DataTable that contains the customer rowset could have some data updates saved to the database using a SqlCommand object that is specific to the customer table in the Northwind database. The SqlCommand might reference a SQL statement or a stored procedure that is associated with a SqlDataAdapter object that updates the DataTable's changes to the data source. Later in this column, I'll demonstrate how to make modifications to a hierarchical structure using these objects and using sample code from the SQL Server™ 2000 Northwind database.

Data Shaping

While XML programmers have been enamored with this hierarchical data structure for years, ADO.NET has made it quite simple to implement in Microsoft® .NET-based applications. In the search to implement hierarchical data structures in everyday applications, previous versions of ADO had a technique known as data shaping. While data shaping does represent hierarchical structures, it also has some drawbacks significant enough that it hasn't caught on in the mainstream. Also, in order to use data shaping you had to learn the SHAPE syntax.

While data shaping requires a special data provider and syntax, hierarchical data structures in ADO.NET can be created using the standard SQL provider and SQL syntax that we all know and love. Data shaping and other XML features of traditional ADO were being added left and right as XML gained in popularity and use. However, ADO itself was not built on XML, so it had a fundamental disadvantage that had to be resolved in ADO.NET. While the XML features of traditional ADO, such as data shaping and saving to XML, are useful, XML features in ADO.NET are far more comprehensive. I have not heard a single person say they were going to miss the data shaping syntax.

Defining DataRelations

How do you define a hierarchical rowset using a DataRelation? Let's take a look. A DataSet containing customers, their orders, and their order details in three separate DataTable objects can be related to one another using DataRelation objects. Using a DataSet containing DataTable objects linked through DataRelation objects, you can cascade updates, navigate parent records, join data from different data sources, or even aggregate data without doing it in the database. The first step is to create a DataSet that stores the three separate rowsets that were derived from three separate queries (see Figure 2).

Figure 2 Creating a DataSet

//— Create the connection
string sCn = "Data Source=(local);Initial Catalog=northwind;
User ID=sa;Password=yourpassword";
SqlConnection oCn = new SqlConnection (sCn);
DataSet oDs = new DataSet();
//— Fill the customer DataTable
string sSqlCustomer = "SELECT CustomerID, CompanyName, ContactName 
   FROM Customers";
SqlDataAdapter oDaCustomer = new SqlDataAdapter(sSqlCustomer, oCn);
oDaCustomer.Fill(oDs, "Customer");

//— Fill the order DataTable
string sSqlOrder = "SELECT CustomerID, OrderID, OrderDate FROM Orders";
SqlDataAdapter oDaOrder = new SqlDataAdapter(sSqlOrder, oCn);
oDaOrder.Fill(oDs, "Order");
//— Fill the order detail DataTable
string sSqlOrderDetail = "SELECT od.OrderID, od.ProductID, 
                p.ProductName, " +
                " od.UnitPrice, od.Quantity FROM [Order Details] od " +
                " INNER JOIN Products p ON od.ProductID = p.ProductID";
SqlDataAdapter oDaOrderDetail = new SqlDataAdapter(sSqlOrderDetail, 
                                oCn);
oDaOrderDetail.Fill(oDs, "OrderDetail");

The code in Figure 2 creates a DataSet with a DataTable containing all customers, a second DataTable containing all orders, and a third DataTable containing all order details. At this point the code defines the DataSet to contain the three rowsets, but the relationships are not yet established. To relate the DataTable objects I'll use two DataRelation objects: one to relate the customers to their orders and a second one to relate the orders to their order details. Figure 3 shows how to create the DataRelation objects for this scenario, building on the code sample in Figure 2.

Figure 3 Creating DataRelation Objects

//— Create the DataRelation and
//— relate the customers to their orders
DataRelation oDr_Customer2Order = new DataRelation("Customer2Order",
            oDs.Tables["Customer"].Columns["CustomerID"],
            oDs.Tables["Order"].Columns["CustomerID"]);
oDs.Relations.Add(oDr_Customer2Order);

//— Create the DataRelation and
//— relate the orders to their order details
DataRelation oDr_Order2OrderDetail = new         
            DataRelation("Order2OrderDetail",
            oDs.Tables["Order"].Columns["OrderID"],
            oDs.Tables["OrderDetail"].Columns["OrderID"]);
oDs.Relations.Add(oDr_Order2OrderDetail);

By creating the DataRelation objects and then adding them to the DataSet's Relations collection, the three DataTable objects' rowsets are related to one another through the defined fields. Like most of the ADO.NET objects, the DataRelation object has several different constructors. I used the constructor that accepts the name of the relation, the parent table's column, and the child table's column. If there were multiple columns that define the relationship, I could have passed in an array of the parent table's columns and an array of the child table's columns. Another option is to use the same first three parameters that I used in Figure 3 and then pass in a fourth parameter to represent whether the constraints should be created automatically (pass in a Boolean value). But more on constraints in a moment.

Once the DataSet is filled with the three rowsets and the relations are established linking the DataTable objects, the DataSet could easily be displayed in a DataGrid on a Web Form by setting the DataSource property like this:

dataGrid1.DataSource = oDs;

The DataGrid is clever enough to figure out that there are multiple DataTable objects that need to be displayed and that it should allow the rowsets to be navigated in the order that's prescribed by the DataRelation objects.

Constraints and Cascading

Just like with databases, foreign key constraints in DataSet objects help enforce data integrity. When I created the Customer2Order DataRelation in Figure 3, it automatically created a unique constraint on the parent table's key (customer.customerid) and a foreign key constraint on the child table (order.customerid). The foreign key constraint will automatically be in effect when the DataSet is rendered into a Web Form's DataGrid. So if you try to modify the child record's foreign key value to a value not in the parent DataTable, a trappable integrity violation will occur. To demonstrate this, you could disable the constraints so that they are not enforced by setting the DataSet's EnforceConstraints property to false. Other than for demonstration purposes, I don't recommend doing this as it will allow your users to violate the data integrity of your data and application.

Support for cascading updates and deletes through foreign keys is a new feature of SQL Server 2000. For example, if you turn on these features in SQL Server 2000 and you delete the parent row, the related child rows will also be deleted. ADO.NET has a similar feature that can be controlled through the DeleteRule and UpdateRule properties of the ForeignKeyConstraint object. By default, these rules are set to allow cascading modifications to the related data. So if you change the value of CustomerID in the customer DataTable, the value of CustomerID in the orders DataTable will be updated accordingly. However, these can be altered by setting the properties to one of the Rule enumerations such as Cascade (the default), None, SetDefault, or SetNull.

To create your own foreign key constraint you can define a ForeignKeyConstraint object. This gives you the flexibility of defining the columns and constraint rules before applying the constraint. The following code snippet demonstrates how to create a ForeignKeyConstraint and associate it with a DataRelation:

ForeignKeyConstraint oFKey;
oFKey = new ForeignKeyConstraint("CustomerForeignkey",
    oDs.Tables["Customer"].Columns["CustomerID"],
    oDs.Tables["Order"].Columns["CustomerID"]);
oFKey.DeleteRule = Rule.Cascade;
oFKey.UpdateRule = Rule.Cascade;
oDs.Tables["Customer"].Constraints.Add(oFKey);
oDs.EnforceConstraints = true;

Children and Parents

When you have a DataRelation it's important to know how to navigate up and down the parent and child rowsets. For example, if you wanted to loop through the orders for particular customer, you could retrieve those orders because of the DataRelation that you already established. Figure 4 demonstrates how to retrieve all child rows and then loop through them.

Figure 4 Looping Through Child Rows

//— Retrieve the child rows for the first customer
DataRow[] oRows = 
   oDs.Tables["Customer"].Rows[0].GetChildRows(oDr_Customer2Order);
string sMsg = "The orders for the first customer are: \n";
//— Loop through the child rows for the first customer
for (int i=0; i < oRows.Length; i++) 
{
//— Grab the values for each child row of the first customer
    DataRow oRow = oRows[i];
    sMsg += "\t" + oRow["CustomerID"].ToString() +
            " " + oRow["OrderID"].ToString() +
            " " + oRow["OrderDate"].ToString() + "\n";
}
//— Display the values of the child rows
MessageBox.Show(sMsg);

Figure 4 also shows how to retrieve all children of the first customer and loop through them. But what if you want to find the parent row or rows of a DataTable? If there could be more than one parent row, you can use the GetParentRows method. However, for my example there is a single parent row for each order, so I used the GetParentRow method. This code shows how to retrieve the values in the parent Customer DataTable from a reference to the first row in the Order DataTable:

DataRow oRow = 
   oDs.Tables["Order"].Rows[0].GetParentRow(oDr_Customer2Order);
MessageBox.Show(oRow["CompanyName"].ToString());

Conclusion

Another feature worth noting is how ADO.NET can render itself to XML so easily. ADO.NET uses XML as its foundation and, as such, you can render a DataSet to XML through ADO.NET. Using the WriteXml, WriteXmlSchema, GetXml, or GetXmlSchema methods of the DataSet you can retrieve the XML that represents the underlying structure for the DataSet. Issuing the GetXml method on the DataSet from the examples in this column would result in a string that contains its XML. The GetXmlSchema method retrieves the structure of the data (its schema) while the GetXml method retrieves the schema and the data:

MessageBox.Show(oDs.GetXml());

You can also create a DataSet from an XML string or file using the ReadXml and ReadXmlSchema methods. For example, you could save a DataSet including its structure and data to an XML file using the WriteXml method. Then later you could recreate the DataSet from the same XML file using the ReadXml method.

The key point to remember with relations in ADO.NET is that you should first determine if you need them at all. Relations are great for applications that require a hierarchical representation of the data. However, if your scenario includes displaying the data in a single grid format, a related series of rowsets may offer little advantage to a single joined rowset. In the next few installments of this column, I will expand further on DataRelation objects, how to perform batch updates, reviewing shortcuts available through the SqlCommandBuilder, concurrency and consistency considerations, as well as expression-based columns in both ADO.NET and SQL Server.

Send questions and comments for Johnny to  mmdata@microsoft.com.

John Papais the VP of Technology at MJM Investigations in Raleigh, NC. He has authored several books on ADO, XML, and SQL Server and can often be found speaking at industry conferences such as VSLive. You can reach him at mmdata@microsoft.com.