Rowset Persistence in Microsoft Data Access Components

Author: Peter Tucker

Summary: The use of rowset persistence has generated many questions during the development of ADO and RDS version 2.0. This article provides answers to questions through a demonstration of persistence with ADO/RDS in Microsoft Visual Basic and Microsoft Visual C++.

Contents

  • Introduction

  • Actions That Cause Rowset Persistence

  • A History of Rowset Persistence

  • What Gets Persisted and Why

  • Hierarchies

  • Stream Format

  • Limitations and Caveats

Introduction

Rowset persistence is the ability to save a static image of a rowset's data, metadata describing the columns of the rowset, and the rowset's state in a predefined format so that it can later be loaded and used again. You can accomplish rowset persistence with the Microsoft® Data Access Components version 2.0, which includes OLE DB 2.0, Microsoft ActiveX® Data Objects (ADO) 2.0, and Remote Data Service 2.0. Rowset persistence is offered by an OLE DB service component (the Microsoft OLE DB Persistence Provider), which sits between the underlying OLE DB data provider and an OLE DB consumer written in either OLE DB or ADO and RDS.

An OLE DB data provider exposes data from a backend data store such as Microsoft SQL Server™. An OLE DB consumer contains the data returned by the data provider and lets you, as the developer, present it to the end user. OLE DB service components were formerly known as OLE DB service providers but were renamed with MDAC 2.0 to distinguish their functionality from that of an actual data provider. Rowset persistence, one of several such service components, adds functionality that would otherwise not be available from the data provider or to the data consumer.

The use of rowset persistence has generated many questions during the development of ADO and RDS version 2.0. This article is an attempt to answer those questions by demonstrating persistence with ADO/RDS in Microsoft Visual Basic® and Microsoft Visual C++®, as well as in an OLE DB consumer. It also provides some details that demonstrate when and how persistence is accomplished.

Actions That Cause Rowset Persistence

Three main actions cause a rowset to be persisted:

  • Three-tier interaction between an RDS object and a business object on a different machine.

  • Marshaling a rowset across process boundaries.

  • Calling the Save or Open methods on an ADO Recordset object.

Within the context of the following discussion, the terms OLE DB rowset and ADO Recordset object are used interchangeably. An ADO Recordset is actually a wrapper around an OLE DB rowset.

Three-Tier Interaction

When the RDS data control invokes the Refresh method or the SubmitChanges method, the rowset used by the data control is persisted and will invoke the Persistence provider automatically "behind the scenes." This applies only in the three-tier case—that is, when the Server property on the data control is not "", indicating a local server.

If you call Refresh, the request goes to the business object on a different machine to generate a rowset and send it back to the client. The business object persists the rowset into a stream, and the client reads from that stream and generates a disconnected rowset.

If you call SubmitChanges, the rowset inside the data control is persisted into a stream and read by the business object on a different machine to perform the update. In this case, by default, only the modified rows are persisted for efficiency. You also have the option of marshaling all recordset objects if the MarshalOptions property is set to adMarshalAll.

A stream can be passed from one machine to the other in two ways. The first is through HTTP (or HTTPS) and an RDS technology code-named SOAP. SOAP uses the HTTP protocol to pass the stream from the client, through wininet, to the HTTP server. On the HTTP server, SOAP is loaded as an ISAPI DLL and reads the stream. From the stream it retrieves the PROGID of the business object to create and the function to call. Then it reads the rowset from the stream and passes it as an argument to the business object.

The second way to get a stream from one machine to the other is through DCOM. DCOM uses the IMarshal interface from the rowset to save the rowset to a DCOM-created stream and then sends it to the server. On the server, DCOM retrieves the CLSID from the stream to determine what object to create, gets its IMarshal interface, and gives it the stream from which to load.

Marshal a Rowset Across Process Boundaries

This method is similar to the three-tier DCOM method. The main difference is that the stream is sent through COM rather than through DCOM. For example, this sort of marshaling occurs when code in a Microsoft Access database calls code in a different Access database library and passes an ADO Recordset. In this case, the Recordset is created in the Access database library. COM is used to pass that Recordset to the calling Access database, and COM will invoke the Persistence provider automatically "behind the scenes."

ADO Save and Open

In ADO 2.0, you can use the Save method on a Recordset object to save the recordset to a file on disk and then call Open to open that file into a recordset. With both methods, ADO will invoke the Persistence provider automatically "behind the scenes."

The following code sample demonstrates how to use ADO in Visual Basic to persist a recordset.

ADO Save and Open Methods in Visual Basic

Sub Example()
   Dim stFile As String
   Dim rs As New ADODB.Recordset

   'Open a recordset on authors
   stFile = "c:\rsets\myrset.adtg"
   rs.LockType = adLockBatchOptimistic
   rs.CursorLocation = adUseClient
   rs.Open "select * from authors", "DSN=pubs;"

   'Change the first person's first name to Joe
   rs("au_fname") = "Joe"
   rs.Update

   'Save the recordset
   If Dir(stFile) <> "" Then Kill stFile
   rs.Save stFile

   'Close the recordset
   Set rs = Nothing

   'Now, open the recordset
   rs.LockType = adLockBatchOptimistic
   rs.CursorLocation = adUseClient
   rs.Open stFile

   ' Reconnect the recordset to its original data source, 
   ' and save the changes
   rs.ActiveConnection = "DSN=pubs;"
   rs.UpdateBatch
End Sub

The preceding code opens a recordset and makes a change and then saves the recordset and closes it, saving the changes to the recordset. Later the recordset is re-opened and reconnected, and the changes that were stored are committed with the UpdateBatch method.

Unlike other service components that can be invoked directly from consumer applications written in OLE DB using C++, the Persistence provider cannot be invoked directly. The IService interface used by the provider is subject to change, and anyone writing an application to it directly will not be able to upgrade to future versions of OLE DB without breaking the application. The use of IService is not supported for MDAC 2.0, because it is a private interface.

However, you can invoke the Persistence provider with minimal overhead through ADO in a way that will ensure backward compatibility in the future. The next sample demonstrates this and can be applied to a consumer application written in C++ for an OLE DB developer. This example does not utilize the IService interface in the Persistence provider directly.

OLE DB using ADO to save and open a rowset in Microsoft Visual C++

HRESULT SaveMyRowset(IRowset *pRowset, BSTR bstrFileName)
{
   // Get the recordset interface
   IADORecordset * pRs = NULL;
   IADORecordsetConstruction * pADORsCt = NULL;
   HRESULT hr;

   // Get an ADO recordset and ask for the recordset construction 
   // interface
   if (FAILED (hr = ::CoCreateInstance(
                              CLSID_ADORecordset, 
                           NULL,
                           CLSCTX_INPROC,
                           IID_IADORecordsetConstruction, 
                           (LPVOID *) &pADORsCt)))
      return hr;

   // Give the rowset to ADO
   if (SUCCEEDED (hr= pADORsCt->put_Rowset(pRowset))
   {
      hr = pADORsCt->QueryInterface(IID_IADORecordset, (void **)&pRs);
      hr = pRs->Save(bstrFileName, adPersistADTG);
      }

   // Cleanup: (or use ATL CComPtr class)
   pRs->Release();
   pADORsCt->Release();

   return hr;
}

HRESULT LoadMyFile(BSTR bstrFileName, IRowset **ppRs)
{
   IADORecordsetConstruction * pRsCt = NULL;
   IADORecordset * pADORs = NULL;
   HRESULT hr;

   //Create the ADO Recordset
   if (SUCCEEDED (hr = ::CoCreateInstance(CLSID_ADORecordset, 
                     NULL,
                     CLSCTX_INPROC,
                     IID_IADORecordset,
                     (LPVOID *) &pADORs)))
   {
      VARIANT varEmpty;

      varEmpty.vt = VT_ERROR;
      varEmpty.scode = DISP_E_PARAMNOTFOUND;

      hr = pADORs->Open(bstrFilename, varEmpty, -1,-1,-1);
      hr = pADORs->QueryInterface(IID_IADORecordsetConstruction, 
                                 (void **)&pRsCt);
      hr = pRsCt->get_Rowset (ppRs);
   }
   pADORs->Release() ;
   pRsCt->Release() ;

   return hr;
}

To save a rowset, the function takes a rowset and hands it to an ADO Recordset. Then it uses the Save method on the recordset to save the rowset to a file. To load a rowset, the function creates a recordset and uses the Open method to open the file. Then it retrieves the rowset out of the recordset and returns it to the user.

The following code demonstrates how to use #import with ADO from Visual C++ to save and open ADO Recordset objects from a file.

ADO Save and Open methods in Microsoft Visual C++

#import <msado15.dll> rename_namespace("ADO20") rename("EOF", "EndOfFile")
...
#define CREATEiNSTANCE(sp,riid) \
      { HRESULT _hr =sp .CreateInstance( __uuidof( riid ) ); \
      if (FAILED(_hr)) _com_issue_error(_hr); }
...
void CRPDlg::OnBUTrecSetPersist() 
{
   _variant_t  vtEmpty (DISP_E_PARAMNOTFOUND, VT_ERROR);
   const char *FileName = "c:\\temp\\temptbl.dat";

   _ConnectionPtr conn;
   _RecordsetPtr  rst;
   
   List1.ResetContent();
   
   try{
      CREATEiNSTANCE(conn,Connection);
      CREATEiNSTANCE(rst,Recordset);
        
      conn->ConnectionString=   L"Provider=SQLOLEDB;Data Source=(local);"
                              L"User ID=MyUserName;Password=MyPassword;"
                              L"Initial Catalog=pubs";
      conn->Open("", "", "", -1);

      conn->Execute("CREATE TABLE testtable (dbkey INT PRIMARY KEY, 
      field1 CHAR(10))",&vtEmpty,adCmdText);
      conn->Execute("INSERT INTO testtable VALUES (1, 
      'string1')",&vtEmpty,adCmdText);

      rst->putref_ActiveConnection(conn);
      rst->CursorLocation = adUseClient;
      rst->Open("SELECT * FROM testtable",vtMissing, adOpenStatic, 
      adLockBatchOptimistic,-1);

      rst->Fields->Item[_variant_t(1L)]->Value = "NewValue";
      rst->Save(FileName, adPersistADTG);
      rst->putref_ActiveConnection(0);

      rst->Close();
      rst->Open(FileName,vtMissing , adOpenStatic, adLockBatchOptimistic, adCmdFile);

      rst->putref_ActiveConnection(conn);
      rst->UpdateBatch(adAffectAll);

      //Now delete the table from the database! 
      //We don't want to clutter things up...
      conn->Execute("DELETE FROM testtable",&vtEmpty,adCmdText);
      conn->Execute("DROP TABLE testtable",&vtEmpty,adCmdText);
      DeleteFile(FileName);

      rst->Close();
      conn->Close();
      ::MessageBox( NULL, "Success! Recordset Persistence Complete.", 
      "", MB_OK);
   }
   catch( _com_error &e){
      DeleteFile(FileName);
   }
}

A History of Rowset Persistence

Rowset persistence has been a part of the MDAC components since ADC 1.0 was first introduced. With ADC 1.0, it was strictly behind the scenes for both three-tier and COM functionality. In version 1.5, the functionality was enhanced to include saving some rowset properties. Persistence functionality in 1.5 remained behind the scenes. In 1.5, the functionality was built into the Client Cursor Engine DLL (MSADCE.DLL) and was hard-coded to use the internal properties of a rowset generated by the client cursor engine. With MDAC 2.0, the ability to explicitly invoke the Persistence provider via the ADO Open and Save methods was added.

In RDS 2.0, the rowset persistence code moved into a separate service component (Msdaprst.dll), which used OLE DB interfaces exclusively to get the information to store in a stream of data saved to and retrieved from a file. Because of this, persistence became available to a rowset generated by any data provider, not just rowsets generated by the Client Cursor Engine.

An alternative to the Persistence provider is to translate your recordset into a safearray for persistence (or marshaling). For small rowsets this technique is faster because you don't have to save metadata or other information needed to post changes in the rowset back to the underlying data store. This metadata is included and is always the same size regardless of the number of records in a rowset. For larger rowsets, the Persistence provider will offer increased performance over safearrays, as the rowset streaming is actually a bit faster. No compression algorithm is used during the persistence of the rowset safearrays.

The Persistence Provider's Rowset

Whether your consumer is OLE DB or ADO, the rowset generated by the Persistence provider is a simple, scrollable, non-updatable rowset. For OLE DB developers, there are limitations on the rowset returned. The rowset supports direct binding and ByRef binding, but not IStream bindings, nor does it support reference accessors. It supports the optional interfaces IColumnsRowset, IChapteredRowset, and IRowsetUpdate.

In the IRowsetUpdate implementation in ADO/RDS 2.0, only the GetRowStatus and GetOriginalData methods are implemented. Because the Persistence provider saves original as well as changed data, it needs a way to present that information when the stream is being loaded. All other methods on IRowsetUpdate return E_NOTIMPL because this implementation does not allow the data in the Persistence provider's rowset to be updated.

What Gets Persisted and Why

In general, the rule for what parts of the rowset are persisted is based on the question, "Can the rowset behave in a similar way without this information?" Or, to put it another way, "By not including this information, does the rowset lose too much functionality to be useful?"

This rule is especially relevant for properties and property values. For example, ADO/RDS persists the property values that define the rowset's updatability. It does not persist the properties that describe the rowset's bookmarks.

Also, metadata strings such as column names and base table names are stored in Unicode format, because OLE DB (like all of OLE) is Unicode. Translating Unicode to ANSI to persist in the stream, and retranslating the ANSI string back to Unicode, could cause an unacceptable loss of data for some characters that cannot be mapped.

Rowset Metadata

In rowset persistence, information about the rowset is first persisted, including column count, visible column count, and base table count.

Next the rowset's properties are stored. The GUID of the OLE DB property set is persisted once with a count of the properties stored for that set. Then each property ID, the length of the data, and the data itself are persisted. It is stored in this way so that if a future version of the Persistence provider saves a property whose property and even variant type are unknown by an older version, the older version can determine how much to read before moving to the next property. ADO/RDS ignores any property appearing in the stream that is unknown to the rowset reading the stream.

In RDS 1.5, the following properties were persisted:

  • DBPROP_IRowsetUpdate

  • DBPROP_IRowsetChange

  • DBPROP_ADC_BATCHSIZE

  • DBPROP_ADC_UPDATECRITERIA

  • DBPROP_ADC_UPDATEOPERATION

DBPROP_IRowsetUpdate and DBPROP_IRowsetChange are standard OLE DB properties. They define whether the rowset is updatable at all and whether the rowset that is built from the stream should support these interfaces. The others are properties specific to rowsets generated by the Client Cursor Engine. DBPROP_ADC_BATCHSIZE determines how many rows at a time the cursor engine should include in any query-based updates. DBPROP_ADC_UPDATECRITERIA and DBPROP_ADC_UPDATEOPERATION determine how the cursor engine should perform updates.

RDS 2.0 added the following properties:

  • DBPROP_COMMANDTIMEOUT

  • DBPROP_MAXROWS

  • DBPROP_ASYNCHFETCHSIZE

  • DBPROP_ASYNCHPREFETCHSIZE

  • DBPROP_ASYCHTHREADPRIORITY

DBPROP_COMMANDTIMEOUT and DBPROP_MAXROWS are standard OLE DB properties. The other three are properties specific to the Client Cursor Engine. They define the asynchronous behavior the reader should use to fetch the rows from the stream.

Finally, base table and column information is persisted. The base table information includes the key defined columns, which the Client Cursor Engine should use during updates. The column information can include quite a bit of information. At the least, it includes the base column name and the "friendly" name (that is, alias), the type, length, scale, precision, and OLE DB column flags. If available, it also includes the base table name, the base catalog name, the base schema name, and more. If you interrogate the stream generated, you might notice that the column name appears to be written twice. This is the column name and its base column name, which happen to be the same.

Row Data

After its metadata is persisted, the rowset's data is persisted. This data is stored as is to preserve the data integrity. A token that precedes each row defines what kind of row is persisted: original row, deleted row, changed row, or inserted row. A "presence map" is stored for each row with a bit for each nullable field in the row, which describes whether the data exists for that column or not. Fixed-length data is stored as is, and variable-length data is stored with a length prefixing it. There are four states the row can be in, and they are described in the following table.

State

Description

Unchanged

The data is stored as is.

Deleted

The original data is stored and a token follows it, marking the row as one that should be deleted.

Inserted

A token marks it as a new row, and the new data is stored as is.

Changed

The original data is stored, and the modified columns are stored. A presence map determines which columns are modified.

Hierarchies

Hierarchical rowsets are special rowsets and are handled slightly differently by the Persistence provider.

First, two new pieces of information are persisted with the column metadata: relational information and calculated information. This information is added for columns of type HCHAPTER only. This means that if a server running RDS 2.0 generates a stream for a hierarchical rowset and passes it to a client that is running RDS 1.5, some results are undefined. The client is not expecting this extra information on the column. Because RDS 1.5 did not support hierarchies, this is acceptable.

The metadata for each rowset is persisted in a top-level recursive fashion. Then the row data for each rowset is stored, also in a top-level recursive fashion. Data is retrieved from each rowset completely, without using the HCHAPTER value from the parent rows.

The alternative is to read a parent row and then get the data from the child rowset for that HCHAPTER value. Reading the data rowset by rowset is more efficient because data in the child rowset can be duplicated for many parent rows. Rows that exist only in the child rowset can be orphaned if read by using the HCHAPTER value. This is particularly a problem if the parent rowset was modified to orphan a child row and that child row was also modified. This change in the child row is not persisted, and data integrity can therefore be lost. The disadvantage of persisting a hierarchical rowset in a rowset-by-rowset fashion is that it is not possible to persist hierarchies that require the HCHAPTER when reading rows. One such rowset is a parameterized hierarchy.

Stream Format

In RDS 1.5 and 2.0, rowsets are persisted in a proprietary format called Advanced Data TableGram (ADTG). Its layout is described in the following table.

Section

Size (bytes)

Notes

Header

9

Identifies the stream as ADTG and sets global information.

HandlerOptions

27

Sets options for the stream reader.

ResultDescriptor

33

Provides information about the rowset, including column count and base table count.

ResultContext

Varies (134 in 2.0)

Identifies property settings on the rowset.

TableDescriptor

13 + len(TableName) + 2 * cKeyColumns

Contains information about each base table used by the rowset. Required for updating the rowset.

ColumnDescriptor

Varies

Contains information about each column in the rowset.

RowData

Varies

Provides actual row data.

Note

The layout of the ADTG stream format shows how much overhead is used in persisting the data of a rowset. It is not intended to help interrogate the stream.

Limitations and Caveats

  • Large data

    When the Persistence provider attempts to save large data ("large data" here is data that is 255 bytes and greater), it binds to the data by using OLE DB ByRef, rather than by using IStream to read the data piece by piece. Some data providers do not support this type of access. Therefore, persisting the rowset fails.

    The workaround is to load the rowset into the Client Cursor Engine and persist that rowset.

  • Hierarchies

    It is not possible to read the Persistence provider's hierarchical rowset directly through ADO. This is because the hierarchy is stored rowset by rowset, rather than row by row. Because the Persistence provider's rowset allows access to only a single row at a time, the child rows are never visible. Instead, the hierarchy should be read back into the provider that created the hierarchy; then it can present it to the consumer correctly.

  • Error Messages

    The Persistence provider does not have an associated resource DLL and therefore has no error messages.