Data Access API of the Day Part III – Disconnected Programming in a Managed Environment

Welcome to Part III of DataAccess API of the Day; a brief history of the evolution of Microsoft’s Data Access APIs.

In Part I we looked at ODBC as Microsoft’s C-based Relational API, and the DAO and RDO automation interfaces that made relational databases exposed through ODBC available to languages like VB. In Part II we looked at OLE DB as Microsoft’s first-class Data Access API for componentized data access within Microsoft’s Component Object Model (COM) environment.

With the introduction of a common, coherent, language-independent, managed framework for writing applications (.NET Framework), Microsoft (again) asked the question “How do we provide first-class support for data?”

In August of 1998 a group of architects from Microsoft met in Semiahmoo, WA, for a "Beachside Offsite for Lightning Technology" (“BOLT” – pretty clever, huh?). "Lightning” was the code name for what at the time was called "COM3" (until we realized that naming a directory “COM3” caused strange things to happen in DOS) and eventually became the .NET Framework. I attended the offsite as a representative for data access, with the hope of re-using much, if not all, of the existing ADO/OLE DB architecture, and spent the next year pushing us not to reinvent the wheel.

The truth was, though, that the world had changed since we did ADO and OLE DB. With the growing popularity of the Internet (thanks Al…), disconnected programming was becoming more important and prevalent in the industry, and XML was becoming an increasingly popular way to work with data. ADO.NET addressed these new challenges by making an explicit separation between connected data access (through provider-specific "Data Providers") and disconnected data access (through a common in-memory "DataSet") with an explicit, extensible "DataAdapter" mechanism for bridging the two.

The separation between connected and disconnected data access was in sharp contrast to ADO/OLE DB which attempted to abstract away remote access and data source functionality differences. While ADO’s model of having a single RecordSet object that could be either forward-only or scrollable, and could represent local or remote data, seemed like a nice simplification, in practice differences in functionality, latency, memory usage, and where errors could occur made it difficult to hide the details of such diverse implementations under a common façade. One of the lessons we learned from OLE DB, and DCOM in general, was that even if you made the interfaces look the same, such differences fundamentally affected the way applications behaved. To build reliable, responsive applications, remote access needed to be done asynchronously and in a non-blocking fashion, which meant knowing when an operation was (potentially) remote.

Although we attempted to preserve as much of the ADO programming paradigm as possible in ADO.NET (for example, the connection/command/result model for providers) the move to ADO.NET was dramatic for most programmers. The first response from beta testers was almost universally "What happened to my ADO?" And then, as they started to use the new ADO.NET there was a gradual realization of the power that the explicit separation between connected and disconnected provided, and with it a feeling of "I could never go back…"

Each of these evolutions was driven by a major platform shift; a client interface for talking to relational stores; scripting and RAD (rapid application development) scenarios; COM/DCOM; a cohesive .NET Framework. And in each case, we strove to provide the right API for the new platform while balancing retaining concepts from previous APIs with taking into account lessons learned and changes in the industry, including federation/componentization of the store, XML, disconnected programming paradigms, etc.

So what's ADO.NET Entities and LINQ?

Next: Part IV – Programming to the Conceptual Model

Mike Pizzo
Architect, Data Programmability