Migrating from ADO "Classic" to ADO.NET
Beta V Corporation
Microsoft SQL Server 2005 (code-named "Yukon")
Microsoft Visual Studio 2005 (code-named "Whidbey")
Summary: Bill Vaughn discusses the process of converting Visual Basic 6.0 ADO code into code that can be used in a .NET application to perform roughly the same operations. (11 printed pages)
Why Is This Trip Necessary?
How Did We Get Here?
Migrating ADO Classic Development Skills
What's New in ADO.NET 2.0
What Are the Alternatives?
Author's Note* About this time last year, Microsoft asked me to write an* article that would help COM-based ADO developers understand the issues and mechanics of getting their data access code ported to .NET. This year they want me to update that article with information about ADO 2.0. Since I'm currently working on a new edition of my Hitchhiker's Guide, I thought I could simply extract a chunk of the new chapter and provide it here.
Hitchhiker's Guide to Visual Studio and SQL Server 2005 (7th Edition) will be published Addison Wesley after Whidbey and Yukon have been released to manufacturing.
Thanks to Calvin, Hobbes and Bill Waterson, I've been fascinated with the term "transmogrify" for some time. This technology has (apparently) been around for some time and purportedly been used to describe the process of changing frogs into princes and puppets into people (or vice-versa). But here I'm going to use transmogrify to refer to the process of converting Visual Basic 6.0 ADO code into code that can be used in a .NET application to perform roughly the same operations. Transmogrification implies that the result of the conversion is barely recognizable from the original source—and in this case that's entirely appropriate assuming you morph your COM-based ADO code (from Visual Basic 6.0) all the way to ADO.NET. No, some folks don't think it's that big a change, but I do for the reasons I'll discuss here.
Why Is This Trip Necessary?
After having "repaired" my 3-speed bike on the back lawn one sunny afternoon, my dad suggested that I should not "fix" stuff that's not broken. I completely agree. It does not make sense to disturb functional code unless there is a compelling reason to do so and you have performed a rational cost-benefit analysis. By "rational" I mean an analysis that does not hinge on an emotional sales appeal (perhaps by your lead developer or your third cousin on your spouse's side) but based on a thorough analysis of the expenses you incur designing, developing, testing, deploying and supporting your applications in addition to the expense of retraining your developers, support team and customers. You'll also want to factor in the costs of any hardware upgrades needed to run the Framework and your new application. As you know converting code or even tweaking it can be expensive. Even if you're careful every line you change can have enormous (often unintended) consequences and side effects. That said, there are often enough good-sense reasons to convert existing code such as:
- Perhaps the existing code can't take advantage of improvements in hardware or the OS. For example, your code might be designed to work with (and possibly only with) Windows 9x and your customers have evolved to Windows XP.
- Perhaps the existing code is not as competitive as other companies have created faster, more reliable, more saleable replacements for their applications. We see this all the time as our customers strive to stay ahead in features, functionality and competitive price.
- Perhaps the new application would be able to scale better, support more users, is more secure or easier to deploy or simply able to implement features not available with the existing technology.
- Perhaps your customers complain about how your code seems to work for awhile and then fails—especially after someone installs some other piece of software.
- Perhaps (and possibly most importantly) you find that the new development platform can improve your ability to create new applications and support the code and the customers that use it with a smaller staff.
I'm not going to wade into the process of making the decision to march off from an existing code base toward the beckoning horizon of new technology. I'll leave that up to your IT staff. If you're reading this to get a handle on the mechanics of converting existing COM-based ADO code (what I call "ADO Classic" or ADOc) to run in a Visual Basic .NET application, read on.
How Did We Get Here?
Over the years, Microsoft has introduced a dynasty of data access interfaces. In the beginning these were designed to access specific versions of the Joint Engine Technology (JET) DBMS and SQL Server (via DB-Library). As the technology matured other "generic" or one-size-fits-all (OSFA) interfaces like ODBC and OLE DB were created to access virtually every data source on the planet. COM-based ADO (what I call ADOc) was created to permit easy access to OLE DB.
ADO "Classic" Evolves
Over time, ADOc evolved and eight versions later it is very widely used and integrated into COM-based Visual Basic. ADOc developers had also learned how to build applications that manage databases large and small and is used in client/server, middle-tier and ASP architectures all over the world. ADOc was also well accepted as it supports stored procedures (including full IO parameter management), automatic UPDATE query generation, asynchronous operations, client-side and server-side cursors, RO/FO data streams and more. COM-based ADO had its issues as well. Its dependence on the MDAC stack made it prone to DLL Hell issues—deployed applications sometimes failed when MDAC was upgraded.
ADO.NET Is Introduced
To solve the problem of having to replace components of working applications, Microsoft invented the .NET Framework. Along with this effort they also created an entirely new data access interface—ADO.NET. Early in the process of inventing ADO.NET, the Microsoft developers called their new data access interface "XDO". That's because this new data access paradigm was built to interface with XML—it could read XML files to populate its objects or extrude XML on demand to pass data and schema to another tier. The name made sense. However, Microsoft decided that developers would be confused and angered if Microsoft created yet another data access interface so they called it "ADO.NET". Sure, both ADOc and ADO.NET do the same thing at the higher levels. However, how they do it behind the scenes is very different (and in my opinion), much better (for the most part).
When ADO.NET first shipped it was missing many of the fundamental features supported by the now fully evolved ADOc. These include batch updates, asynchronous operations, server-side cursors, runtime action command generation and more. Some would argue that that ADO.NET was designed for use with ASP.NET—not necessarily for client/server applications but Microsoft believes that disconnected DataSets can help client/server applications be more efficient and break their dependency on hard-to-scale designs that depend on server-side state management. Because of this philosophy, ADO.NET does not include support for server-side cursors. The new approach was to focus on a "disconnected" client-side data store that could be easily persisted and serialized to XML on demand to pass to other layers. This also worked nicely for Microsoft's new XML Web Services Service Oriented Architecture (SOA) initiative. Note that the XML used to persist an ADOc Recordset is not compatible with the XML format expected by ADO.NET. (See Much ADO About Data: Doing the Impossible (Again).) Microsoft also felt that it was better to permit developers to build their own action command (UPDATE, INSERT, DELETE) SQL or let wizards build it instead of relying on the property-driven ADOc Update method code that often made mistakes when trying to figure out how to make changes to a Recordset. Yes, they also implemented the CommandBuilder to automatically construct action commands in simple scenarios, but as I describe in my article on this method, I don't think you'll want to take that route. (See Weaning Developers from the CommandBuilder.) Sure, there are ways to work around many of these issues, but this extra effort can further discourage folks from migrating or make the process that much harder.
Dealing with the Issues
On the other hand, Microsoft dealt directly with the OSFA issue by permitting developers to write to managed interfaces. That is, ADO.NET exposes a .NET Data Provider specifically for Microsoft SQL Server (SqlClient) and others including Oracle and DB2. This meant data provider engineers could leverage all the features of their DBMS and communicate to it at the lowest levels. For the first time since DBLibrary, ADO.NET was able to interact with SQL Server using Tabular Data Stream (TDS)—it's low level protocol. This meant the SqlClient provider could access any and all of the SQL Server features and nuances so applications can better leverage its power. Since the native interfaces were similar to the .NET OSFA interfaces implemented by the OleDb and Odbc .NET Data Providers, developers do not have problems transitioning code from one DBMS to another.
ADO.NET also exposed another feature of DBLibrary—direct access to the data stream returned by the low-level data interface. Implemented as a DataReader, this connected, forward-only one-row-at-a-time interface links your code to the inbound data returned from a query. All of the methods in ADO.NET use the DataReader either directly or behind the scenes to fetch resultsets. This means your applications can fetch data faster and have more control over the process.
Migrating ADO Classic Development Skills
For the most part, the data-access logic in most applications can be segmented into several parts:
- Getting Connected: This involves building a connection string, (in some cases) integrating user-provided credentials, and establishing a connection—either just-in-time or for life of the operation.
- Managing the query: This meant creating a query string and integrating user-provided parameters. This also encompasses managing transactions.
- Executing the query: This includes choosing an appropriate execution method that matched the resultsets returned. Sometimes this means using methods that accept a rowset, scalar or stream or simply execute an action command.
- Manage the resultset(s): These routines store and process rowsets, returned parameters or bind the results to controls. These routines might also manage "relationships" between the rowsets returned–as when working with hierarchical data. In some designs, these routines navigate through client-side or server-side cursors as well as sort, filter and locate rows.
- Manage changes: When data changes, these routines move data from static sources or bound controls into ADO data structures that manage updates, insertions and deletions. These routines also manage concurrency collisions, batch operations and transactions.
- Manage exceptions: All of these routines have exception handlers that deal with the issues that commonly occur as the connection is made, the query is prepared and executed and the data is posted to the server.
ADO developers will find that ADO.NET code can be segmented and coded in similar ways. However, there are differences in each part that can be a bit daunting at first, but taken individually can be handled fairly easily. No, the Visual Basic 6.0 conversion wizard won't convert ADOc code to ADO.NET—it's simply not possible give the differences in the approach taken by ADO.NET. For example, if you used the SHAPE syntax in ADOc to manage hierarchies, you'll find that the .NET Framework does not support SHAPE, but does support managing multiple, related rowsets by using the DataSet class which can contain several DataTable objects—each containing an independent rowset.
Each of the .NET Data Providers are responsible for implementing their own "flavor" of the ADO.NET classes. They all support a "core" set of operations using (roughly) the same properties and settings. Of course, each supports its own SQL variation and ConnectionString settings. For example the SqlClient provider (used for Microsoft SQL Server) implements a SqlConnection, SqlCommand, SqlDataAdapter and SqlDataReader while the OleDb provider supports the OleDbConnection, OleDbCommand etc. I refer to these objects by their function in my books—for example, the SqlCommand is called the "Command" class.
The ADO.NET Connection (SqlConnection) class is similar to the ADOc Connection object, but it accepts a different (albeit familiar) ConnectionString. It can't be opened asynchronously, but it can be managed automatically by the DataAdapter when you execute the Fill and Update methods. You can't execute SQL directly against the Connection object in ADO.NET—you'll need to build a Command to do so. You'll also want to morph your connection error handlers to expect the same type of issues that occurred with ADOc connection attempts. Note that ADOc uses roughly the same connection pooling mechanism as ADO.NET but with ADO.NET you can manage the pooling options and state far more easily.
The ADO.NET Command class is similar to the ADOc Command object but in this case you're required to build a Command object to execute SQL or run a stored procedure. The methods exposed on the ADO.NET Command object are really very different. Unlike ADOc, you have access to a new lower-level interface—the DataReader (SqlDataReader) which exposes a high-speed raw data stream to return data from the query. The Command class supports a Parameters collection that's similar to that used in ADOc. Be aware that SQL parameter marking is different in ADO.NET.
You'll also discover that the ADO.NET DataTable is roughly equivalent to the ADOc Recordset. While it's not managed as a "cursor", you'll find it more efficiently stores and manages the returned rowset. Navigating to a specific row is as easy as addressing an array. You can also manage several rowsets at once from a variety of data sources using the DataSet. This is a class used to manage one or many DataTable objects and the rowsets they contain. You can code relationships between these rowsets (even when they come from disparate sources) and easily navigate, filter and search based on parent-child relationships you define.
Data binding has also changed. Without going into the litany of issues with Visual Basic 6.0 data binding, suffice it to say that you can more easily bind to rowsets using either drag-and-drop generated code or by setting up your data binding linkages in your own code. ADO.NET 2.0 further refines the data binding paradigm to make this process even easier.
Addressing the ADOc Recordset is fairly expensive as all columns are returned as a Variant. Since ADO.NET can leverage strongly typed DataTables and DataSets, you'll find it far easier to address objects. This means data is stored and managed in its native type—an integer is stored as an integer, a string as a string. You might also notice that strongly typed operations are orders of magnitude faster. We also encourage you to use the "Option Strict On" and "Option Explicit On" options in the development environment. While this might mean your code has to explicitly coerce variables (converting the types in code), the result will be more stable code that won't fail (as often) when unexpected data arrives.
To make managing table updates easier, the ADO.NET DataAdapter is patterned after the Visual Basic 6.0 Data Object Wizard (DOW). This class permits you to define your own UPDATE, INSERT and DELETE SQL—be it an ad hoc SQL query or a stored procedure. This makes ADO.NET far lighter, but places the responsibility for these commands on your code—ADO.NET no longer attempts to generate these commands at runtime like ADOc. As we'll see, ADO.NET 2.0 reintroduces batch updates as well as asynchronous operations to improve performance.
As with ADOc, exception handling is a big part of your ADO.NET design. Fortunately, the .NET Framework supports Try/Catch exception management which is far more powerful and flexible than the traditional Visual Basic 6.0 "On Error" routines to which you might be accustomed. This approach permits you to filter those specific data-centric exceptions from those caused by other issues. This makes writing your exception handlers easier and makes your applications less prone to unexpected failures.
What's New in ADO.NET 2.0
The latest version of ADO.NET 2.0 fills some of the gaps left behind when you migrate from ADOc and implements several features we've never seen before. Many of these innovations permit you to build more secure, more robust (and faster) code—but especially Windows Forms (client/server) applications. These updates include:
- No MDAC Dependency. While earlier versions of ADO.NET often required an upgrade to the MDAC stack (which includes the DLLs required to run ADOc applications), ADO.NET 2.0 drops this dependency when using SqlClient. For the OleDb and Odbc .NET Data Providers, older versions of MDAC (2.6-2.9) can suffice. This means that installing a .NET application is less disruptive as it won't disturb the existing ADOc applications.
- Asynchronous operations. While ADO.NET 2.0 can't open a connection asynchronously like ADOc, it can execute DataReader commands asynchronously. This means you can write code that entertains the user with a progress bar or performs other work on your application thread while the query executes (and returns status events as it progresses).
- Multiple Active Result Sets (MARS). We've never been able to use a connection for more than one operation—not until ADO 2.0. MARS can support several operations on the same connection. This means you won't need to open as many connections when reading and updating rows at the same time. Of course, this assumes the .NET Data Provider and the target DBMS support this feature.
- Batch Processing. ADOc supported the ability to send several action commands to the server in a single round-trip. This feature has been reimplemented in ADO.NET 2.0. By batching Update method operations, your application can make changes far more quickly with fewer (expensive) round-trips to the server.
- BCP Support. When moving data from external or generated data sources, it's important to use the bulk copy utility (BCP) instead of conventional ADO.NET methods. ADO.NET 2.0 now includes a BCP class to permit direct access to this high-speed data upload functionality. This approach can mean bulk data transfers can be done in a fraction of the time required by conventional ADO methods.
- Expose a DataSet as DataReader. In order to make it easier to expose data from tier to tier you can now create a DataSet (or DataTable) and pass this data as a DataReader to another layer. You can also load a DataTable directly from a DataReader.
- DataSet Serialization. In earlier versions you could use Remoting to pass DataSets between tiers, but the data was passed as XML. In ADO.NET 2.0 you can serialize DataSets as binary. This means far (far) faster inter-tier performance but uses a proprietary Microsoft format to transport the data. Another option (SchemaSerializationMode=TypedDataSets) strips the schema from the data stream thus reducing the volume of data being moved while still enabling cross-platform scenarios.
- Yukon Support. For the first time, ADO.NET 2.0 supports native SQL Server 2000 data types (and perhaps) include support for CLR-based User-Defined types. These include varchar(max), nvarchar(max), varbinary(max), and the new XML types.
- New Common Base Class. To make it easier to write generic applications, ADO.NET 2.0 exposes a DB* Base class. For example, DbConnection, DbCommand and DbDataAdapter are exposed. These classes still require that provider-specific SQL syntax be used, but they can make it possible to write a single application that can access several different data sources.
- Server and Provider Enumeration. These new classes permit you to discover what .NET Data Providers are available and which servers and instances are visible from your application. These are very useful when writing tools to manage, start/stop/pause servers or simply determine server status. Note that SQL Server now exposes SMO in addition to SQL-DMO to manage SQL Server instances.
- New Counters. The SqlClient provider exposed a number of counters in earlier versions of .NET but they weren't particularly reliable. These new counters promise to be more accurate and include the ability to determine the state of the connection pooling mechanism.
- Connection Pool Management. While earlier versions permitted you to choose connection pool options, ADO .NET 2.0 permits you to clear the pool or all pools as well as other functions to manage pool behavior. The connection pooling mechanism is also improved to help detect a dead pool—one connected to an unavailable server.
ADO has been integrated into Visual Basic and Visual Studio for some time. However, the Visual Studio 2005 team has gone quite a bit farther in the level of integration this time. While you won't see wizards to create the familiar DataAdapter, you will see drag-and-drop techniques that generate a number of familiar and new strongly typed data structures. Remember, these tools are most interesting when you want the RAD interface to generate code for you—especially in simple cases. Microsoft says that they have done a lot of work to support N-tiered development as well. You can create applications that build DataSets in the same assembly or access datasets in referenced assemblies as well as your own objects in referenced assemblies. When your designs get more sophisticated, these same tools can expose middle-tier business objects that you can generate yourself. It's beyond the scope of this article to discuss this integration but I'll cover it in the book.
What Are the Alternatives?
As with any software solution there are a dozen alternative ways to build it—and someone always points out a dozen variations on each point after you're done. Sure, there are trade-offs for each approach. In this case we're going to focus on the data access interface issues and leave the remaining conversion tasks to others. Sure, your data access code might not be that easy to isolate. While most industry pundits and I have been talking about 3-tier (or "N"-tier) designs for a couple of decades now, not everyone has followed our advice. If you've created an application with an independent set of middle-tier objects that handle data, the task of converting these objects to support ADO.NET is fairly straightforward. However, if your code looks anything like the spaghetti code I've reviewed over the years, you might have a dickens of a time pulling the ADOc routines out of the fabric of your application without it falling apart—it will be like trying to pick off dog hair while wrestling with a Russian Wolf hound in early spring. Let's take a look at the alternative approaches—one of them is sure to make sense for you and your skills.
- Let the Visual Basic 6.0 Import Project wizard convert your project to an ADO.NET project. We're told that the new Visual Studio 2005 conversion wizard is smarter than ever, but it still won't convert ADOc code to ADO.NET. While this might take some time for larger, more complex projects, for simpler projects the conversion should be fairly smooth. After conversion, what you end up with is your ADOc code morphed and wrapped in a COM interop layer so it compiles in the new Visual Basic .NET project, can be deployed (with the registered version of MDAC) and run on the target system. This also means you'll need to watch out for the coding techniques that I discuss in the article I wrote last year at this time—there are some late-binding techniques that don't transmogrify very well.
- Another viable approach is to visit each ADOc code section and break it down logically. Consider if the task can be done with a disconnected DataSet or simply use a DataReader to return rows—it's similar to a default "firehose" Recordset. In this case conversion to ADO.NET is a clean, straightforward process. When you're dealing with server-side cursors (especially popular with client/server applications), you'll need to decide on a strategy that reengineers the code to use a disconnected DataSet or consider managing a home-brew server-side cursor class using ANSI CURSOR operators. We've seen that in many cases, the need for server-side cursors could be eliminated by a few design changes. However, this might not be possible so look before you leap.
- A third approach is especially useful when you've isolated your ADOc code in middle-tier components. In this case you can replace your ADOc code by "simply" creating ADO.NET versions of the components that return similar structures—assuming the component consumer does not return ADOc Recordsets to the other tier(s). If you're exposing a custom business object class, the tools within Visual Studio 2005 can make it far easier to transmogrify to an ADO.NET-based data access component.
Before you wade into a conversion process you need to consider the starting point and ask the question: "Can I get there from here to there fording any deep rivers?" Consider these factors that can affect the conversion process:
- What type of architecture are you migrating from? If you're moving from ASP programming, the ASP.NET approach is far more powerful and manages ADO objects very differently. You'll like the new approach though—it now handles bi-directional data binding that has not been supported up to this point.
- What's your skill level? What computer language is most comfortable for your? If you've done quite a bit of Visual Basic 6.0 coding and have worked with Visual Basic .NET for some time (as many have), you might find it easier to simply port your code "logically". That is, transmogrify the operations—not the code itself. You'll be able leverage most of your connection strings and SQL as long as you don't expect to use server-side cursors. If your skills are not that current, you might find it easier to simply try the Visual Studio 2005 migration wizard and work from there—that's because most of the simple scenarios are converted quite easily.
- How much does your code depend on server-side functionality like keyset or dynamic cursors or routines that manage server-side state like #temp tables? In this case your conversion process will be harder as even the newest ADO.NET 2.0 does not directly support this functionality. However, there is an alternative—use ANSI CURSOR commands.
- Are you migrating from ADO classic (ADOc versions 1.0 through 2.6) or from DAO, ODBC or some other proprietary data access interface? Migrating from a fairly recent version of ADO is not that hard as many of the concepts are similar—not the same, but similar. If you use the Visual Basic 6.0 migration wizard most of your ADOc code can be ported over without change. Yes, there are a few issues that I mention in the earlier article.
- However, moving from DAO, the ODBC API or a proprietary interface is another matter. The automated wizards might not be able to convert your code at all and might not even try. In this case, I recommend that you reevaluate your DBMS engine. If you're using DAO, then you're using JET and I usually recommend customers move from JET to SQL Express as it's more powerful, stable and scalable. Microsoft is in the process of replacing JET everywhere it can—as quickly as it can. I recommend the same to my customers and students.
- What bound controls are you using? While some simple controls like the TextBox, ListBox and ComboBox have fairly easy conversion paths, the DataGrid does not. Consider the binding mechanism in Visual Basic .NET is radically improved and radically different so you might find several disconnects when it comes to complex bound controls. Many properties are not the same in the .NET equivalent controls and you'll find several methods missing as well. Third party controls are usually offered in a .NET version—you'll want to investigate these before venturing too far. As far as custom controls, Visual Studio 2005 does a better job of transmogrifying these, but I would not raise your expectations here—I suspect there will be issues galore with this fairly recent enhancement to the conversion wizard.
- Are you ready to write your own action command SQL logic? Remember, ADO classic created this SQL for you at runtime based on your SELECT statement. In contrast, ADO.NET encourages you to code these action commands yourself. Yes, as I've mentioned before you can use the CommandBuilder, but you'll quickly outgrow its limitations. I think you'll find it's not nearly as flexible as the ADO classic Update method—especially if you consider how the Update Criteria property permitted you to alter the type of action commands created.
Yes, it's possible to transmogrify your ADO "classic" code to ADO.NET. Is it as hard to do as it seems? I doubt it. Once you understand ADO.NET is not ADOc but an entirely new (and better) data access interface, you'll be more comfortable with the conversion process. No, not all applications are destined or meant to be converted. As I said earlier, if it's not broken, don't fix it. I hope this helps. If you need more help here are a couple of books that can get you started. In any case, I hope this article helps get you started.
1. Transmogrify (tràns-mòg´re-fì´, trànz-) verb, transitive. To change into a different shape or form, especially one that is fantastic or bizarre.
(The American Heritage Dictionary of the English Language, Third Edition)