Accessing Microsoft Office Data from .NET Applications
Christa Carpentiere
April 2004
Applies to:
Microsoft® Access
Microsoft® Excel
Microsoft® Office
Microsoft® Visual Basic® .NET
Microsoft® Visual Studio® .NET
Summary: See how to get Microsoft Access and Excel data out of your Office files and into your .NET applications, and how the OLE DB provider can be used to write generic code that makes working with .xls or .mdb files easier. (12 printed pages)
Contents
Introduction
Getting a Connection
Retrieving Data
Using Metadata for Generic Data Retrieval
Conclusion
Related Books
Introduction
Most of the necessary tasks for working with Microsoft Office data in Microsoft .NET applications are the same as using any other data—creating a connection, creating DataReaders or DataAdapters depending on what you need to do with the data, creating one or more DataSets to encapsulate DataTables of related data, etc. So, I'm not going to re-write the ADO.NET documentation here. Rather, the purpose of this article is to provide the missing or widely dispersed pieces of information that are required to write basic data retrieval code against Office data sources. It seems that the people who know the quirks of Jet are less familiar with ADO.NET, and most folks who are ADO.NET whizzes don't know a whole lot about the technologies that underpin Office programmability. So here's a quick once-over that hopefully will answer the common questions I've seen on both sides of this divide.
I'll start by running through the basics of connecting to and retrieving data from Microsoft Excel and Microsoft Access. This area is where I've seen the biggest issues—usually once people can get this far, the Office-specific issues are dealt with and the remaining issues are focused on using ADO.NET. Next I'll discuss how to use the GetOleDbSchemaTable method of your OleDbConnection object to retrieve schema information about your Access or Excel data at runtime. This will let you avoid hard-coding information about your data sources—a useful thing, wherever you get your data from.
Getting a Connection
To connect to your Excel spreadsheet or Access database, you'll need to create an OleDbConnection object, and then pass it a connection string with the specifics for your target data source. You will need to specify the Jet 4.0 OLE DB provider for the Provider property.
Connecting to Access
Let's take a look at a typical connection string for an Access database:
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source= C:\Databases\mymusic.mdb"
This is very simple—it specifies only the Provider and Data Source properties. The Provider is the Jet 4.0 provider, as mentioned above, and the Data Source property contains the fully qualified path to your database.
Of course, you do have even your Access databases secured, right? So you'll also need to specify the path to your workgroup information file (system.mdw, by default) as well as a providing a user ID and password:
"Provider=Microsoft.Jet.OLEDB.4.0;Data " & _
"Source=C:\Databases\mymusic.mdb; " & _
"Jet OLEDB:System database=" & _
"C:\Databases\system.mdw; " & _
"User ID=UserX;Password=UserXPassword"
You'll note that specifying the .mdw file that governs security for your Access database involves using a Jet OLE DB-specific property. If you would like to have more control over the database's behavior, you can explore the other properties offered by the Jet OLE DB provider in Appendix B: Microsoft Jet 4.0 OLE DB Properties Reference to control such things as locking behavior and how to handle certain types of failures. Note that only properties that can be set in the connection string are accessible; there is no way to set the provider properties that require the connection to be open before they are specified.
You can use Microsoft® Visual Studio® .NET to get a template for a full bells-and-whistles Access connection string that includes all of the Jet OLE DB provider settings. Create a data connection to your Access database in the Server Explorer, and then create an OleDbConnection object using the Data section of the Toolbox. The ConnectionString property of the resulting object will include all of the Jet OLE DB provider properties by default.
Connecting to Excel
Now, let's see what a typical connection string for an Excel spreadsheet looks like:
"Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=C:\Spreadsheets\calculations.xls;" & _
"Extended Properties=Excel 8.0"
You can see that, in the case of connecting to an Excel file, we must use the Extended Properties property in the connection string as well as the Provider and Data Source. For Excel 8.0 and later, use the "Excel 8.0" setting. If you would like more information on the other acceptable values for this property, refer to the "Extended Properties Property Settings" section of ADO Provider Properties and Settings.
"Aha," you say, "but what about security in Excel?" Well, there's not great news there, I'm afraid. You cannot open a connection to a password-protected spreadsheet unless you have already manually opened the spreadsheet in Excel (for more information, see XL2000: "Could Not Decrypt File" Error with Password Protected File. The described error is with the Excel ODBC provider, but the behavior is identical in the Jet 4.0 OLE DB provider. Your other option is to remove the password from the spreadsheet, and rely on some other security mechanism (like restricting permissions on the folder where the file resides) to control access.
Unfortunately, you also cannot use Visual Studio .NET to get a template for an Excel connection string. With a little fiddling around, you can create an Excel data connection, but you'll find that its properties are uneditable, and the ConnectionString property will be blank—a little quirk in the IDE (integrated development environment) there. For more information, see PRB: Cannot Configure Data Connection to Excel Files in Visual Studio .NET. So for Excel connections, you are pretty much on your own, but it's really just as easy to code one as to create one in the user interface (UI) in this circumstance.
Retrieving Data
Now that it's clearer how to get a connection to your Office data source, let's take a look at what it takes to retrieve data. I'm going to go ahead and use an OleDbConnection/OleDbCommand/OleDbDataReader data retrieval scenario for simplicity's sake. The same approach with just a bit of tweaking can be used to construct an OleDbDataAdapter and fill a DataSet instead. If you want to know more about ADO.NET generally, the Accessing Data with ADO.NET section of the .NET Framework Developer's Guide is there for your reading enjoyment.
Retrieving Access Data
The first important thing to remember when writing your Access data retrieval code is that the syntax in which your SQL must be specified has some idiosyncrasies. You can't create an Access query in the graphical UI, pop over into SQL View, copy the resulting SQL statement and paste it into your code. No, that would be too easy. The SQL code generated will usually have some, but not all, of the syntax that you need. Anyone who has had to write code in the Access development environment will know this, but for your average .NET client-application developer, this can be news. The big offenders are criteria expressions, which require that specific types of data in your WHERE clause be delimited in certain ways. Date and time values have to be delimited with number signs (#). Text values have to be delimited by single quotes ('). For instance:
SELECT City, Neighborhood, SalePrice, MonthsOnMarket
FROM RealEstate
WHERE ListingDate > #1/1/04#
For more on the specifics of these bugbears, see Date and Time Criteria Expressions.
The other thing to keep in mind would seem to be pretty obvious, but it seems to confound people anyway, so I'll mention it: make sure your Access tables don't use reserved words for column names. You can check out the reserved words at SQL Reserved Words. If you use any of these, I'd say rename your columns if you can. I know this is seldom possible if you are already using the database, so in a pinch you can make a query, and use it as a substitute for the table with the offending column. Just use AS to rename the column, as in:
SELECT Artists.ArtistName, Genres.Genre, Labels.Label,
Tracks.Public AS Track, Releases.ReleaseName
FROM (Labels INNER JOIN ((Artists...
With these items in mind, let's take a look at an example:
Imports System
Imports System.Data
Imports System.Data.OleDb
...
Public Function GetAccessData(ByVal UID As String, _
ByVal pwd As String, ByVal artist As String)
Dim conn As New OleDbConnection
Dim musicReader As OleDbDataReader
Dim cmd As New OleDbCommand
Dim connString As String
Dim i As Integer
Try
' Set the connection string.
connString = "Jet OLEDB:System database=" & _
"C:\Databases\system.mdw;" & _
"Data Source=C:\Databases\mymusic.mdb;" & _
"Provider=Microsoft.Jet.OLEDB.4.0;" & _
"User ID=" & UID & ";Password=" & pwd
' Open the connection.
conn.ConnectionString = connString
conn.Open()
'Set the command properties.
cmd.Connection = conn
cmd.CommandText = "SELECT * from music " & _
"WHERE ArtistName = '" & artist & "'"
' Get the OleDbDataReader
' and do some processing with it.
musicReader = _
cmd.ExecuteReader(CommandBehavior.CloseConnection)
Try
While (musicReader.Read)
'Process data.
End While
Finally
musicReader.Close()
End Try
Catch ex As Exception
'Error handling
End Try
End Function 'GetAccessData
Retrieving Excel Data
Excel shares the same reserved words prohibition with Access in regards to column names. Generally speaking, you are going to be happier if you keep SQL reserved words in mind, and avoid them when creating anything that may be used as a data source. It'll give you one less thing to worry about.
Excel also has its own quirks with syntax. The item that most affects your code is the syntax for referencing the set of data that you want to return.
Note For the most straightforward data retrieval from Excel, use spreadsheets that maintain a table-like format.
Your first option is to specify a worksheet and, optionally, a set of cells on that sheet. You need to make sure that the worksheet name is followed by a dollar sign and then, optionally, a cell set. The cell set is specified by the starting cell and terminating cell of the set, separated by a colon. This whole data identification string is then enclosed in brackets. A SELECT statement using this type of syntax might look like this:
SELECT SalesMonth, TotalSales, PercentageChange1Year
FROM [Sheet1$A1:E24]
Your other option is to create a named range in Excel that will serve as a table analog for you. To create a named range, see Create named cell references or ranges. You will use the range name just like a table name in the SELECT statement:
SELECT SalesMonth, TotalSales, PercentageChange1Year
FROM SalesHighlights
With these items in mind, let's take a look at an example:
Imports System
Imports System.Data
Imports System.Data.OleDb
...
Public Function GetExcelData()
Dim conn As New OleDbConnection
Dim salesReader As OleDbDataReader
Dim connString As String
Dim cmd As New OleDbCommand
Try
' Set the connection string.
connString = "Data Source=" & _
"C:\Spreadsheets\calculations.xls;" & _
"Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Extended Properties=Excel 8.0;"
' Open the connection.
conn.ConnectionString = connString
conn.Open()
'Set the command properties.
cmd.Connection = conn
cmd.CommandText = "SELECT SalesMonth, " & _
"TotalSales, PercentageChange1Year, " & _
"VolumeDiscounts, Profit from [Sheet1$]"
' Get the OleDbDataReader
' and do some processing with it.
salesReader = _
cmd.ExecuteReader(CommandBehavior.CloseConnection)
Try
While (salesReader.Read)
'Process data.
End While
Finally
salesReader.Close()
End Try
Catch ex As Exception
'Error handling
End Try
End Function 'GetExcelData
Using Metadata for Generic Data Retrieval
Now that you can connect to your Excel or Access data source and retrieve data, let's take it a step farther. Let's look at what you need to do to retrieve metadata from one of these data sources, which you can then use to construct your data access code. This process can come in handy if you want to create a procedure offering some generic functionality that you don't want tied to a particular data source.
Using OleDbConnection.GetOleDbSchemaTable
To get the metadata you need, you'll need to use the OleDbConnection.GetOleDbSchemaTable method. The constructor for this method takes an OleDbSchemaGuid object representing an OLE DB schema rowset, and an array of Objects representing what are essentially selection criteria for the schema information to be returned.
Note For those of you who are not familiar with OLE DB schema rowsets, they are essentially standardized schemas for database constructs as defined by ANSI SQL-92. Each schema rowset has a set of columns (referred to as "restriction columns" in the .NET documentation) that provide the defining metadata for the specified construct. So if you request schema information for, say, columns, or collations, you'll know exactly what kind of data you can expect to get back. If you are itching to know more, visit Appendix B: Schema Rowsets.
The Object array is defined as an "array of restriction values" in the documentation. It is used to determine (that is, restrict) the dataset that is returned, kind of like a WHERE clause in SQL. For example, you connect to a workbook that has worksheets Alpha, Beta, and Pi. You want schema information to figure out what columns worksheet Beta contains. Your code will look something like this:
schemaTable = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Columns, _
New Object() {Nothing, Nothing, _
"Beta", Nothing})
You use the OleDbSchemaGuid.Columns field to indicate that the COLUMNS schema should be used, since you want column information returned. The COLUMNS schema contains the TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, and COLUMN_NAME restriction columns, and you need to provide an Object representing a restriction value for each of them in the array. By specifying "Beta" as the TABLE_NAME value, you restrict the column information returned to just those from that "table".
Now that you are familiar with our friend GetOleDbSchemaTable, let's take a look at how it can be put into practice. By using it to iterate through the tables and the columns in your data source, you can get all the information you need to retrieve data without having to be familiar with the schema beforehand. Let's take a look at an example with Excel:
Imports System
Imports System.Data
Imports System.Data.OleDb
...
Public Function GetExcelSchema(ByVal xlsPath As String) As DataSet
Dim schemaTable As New DataTable
Dim workAdapter As New OleDbDataAdapter
Dim workSet As New DataSet
Dim conn As New OleDbConnection
Dim i As Integer
Dim x As Integer
Dim charArray As Char() = {",", " "}
Dim charArray2 As Char() = {"$"}
Dim cmdString As String
Dim cmdString2 As String
Dim cmd As New OleDbCommand
Dim tableName As String
workSet.DataSetName = "excelData"
Try
' Set the connection string.
Dim connString As String = _
"Data Source=" & xlsPath & _
";Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Extended Properties=Excel 8.0"
' Open the connection.
conn.ConnectionString = connString
conn.Open()
' Populate the DataTable with schema
' information on the data source tables.
schemaTable = _
conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, _
New Object() {Nothing, Nothing, Nothing, "TABLE"})
' Populate an array with the table names.
i = schemaTable.Rows.Count - 1
Dim tablesArray(i) As String
For i = 0 To schemaTable.Rows.Count - 1
tablesArray(i) = schemaTable.Rows(i).Item("Table_Name")
Next
' Clear the DataTable
schemaTable.Clear()
' Use the table names and the column schema
' information to construct SELECT statements
' and return data for each table in the data source.
For i = 0 To tablesArray.GetLength(0) - 1
' Populate the DataTable with schema
' information on the data source columns.
schemaTable = _
conn.GetOleDbSchemaTable(OleDbSchemaGuid.Columns, _
New Object() {Nothing, Nothing, _
tablesArray(i).ToString(), Nothing})
' Step through the column names and append
' them into a SELECT statement
cmdString = "SELECT "
For x = 0 To schemaTable.Rows.Count - 1
cmdString = cmdString & _
schemaTable.Rows(x).Item("Column_Name") & _
", "
Next
cmdString2 = cmdString.TrimEnd(charArray)
' Note that you don't need to append "$" to
' the table name - it is already included.
cmdString2 = cmdString2 & " FROM [" & _
tablesArray(i).ToString() & "]"
' Use the SELECT command and the
' OleDbDataAdapter to fill the DataSet.
cmd.CommandText = cmdString2
workAdapter.SelectCommand = cmd
workAdapter.SelectCommand.Connection = conn
tableName = _
tablesArray(i).ToString().TrimEnd(charArray2)
workAdapter.Fill(workSet, tableName)
schemaTable.Clear()
Next
Catch ex As Exception
'Error handling
Finally
conn.Close()
End Try
Return workSet
End Function 'GetExcelSchema
As you can see, the code involved is pretty straightforward. If you wanted to do the same thing with an Access database, the only real difference would be the connection string, and not needing to format the table name as a worksheet so it can be used in the SELECT statement.
By the way, you can use this approach just as well with a SQL Server™ database, or you can modify it to perform other operations based on the schema info you pull out of the data source. This can be nice if you are trying to do any kind of discovery or documentation.
Conclusion
Thanks for joining me on this brief tour of using Office data and ADO.NET together. It's not an unusual task, and I think to date it has been harder than it needed to be, just because some of the basic information on it isn't readily available. Whether you just want to pull in information from that departmental Access database, or aggregate the data from everyone's monthly expense spreadsheets, I hope I've given you a place to start.