How do I include empty Excel rows/columns in OLE DB query output?

Anders Sjöqvist 0 Reputation points
2023-05-19T17:56:35.8033333+00:00

I have a number of Excel files that I'm processing using PowerShell. The files follow a defined pattern but are user-generated.

To illustrate my issue, let's create a new Excel file and in cells B2:C3 we add self-references to each cell (i.e. B2, B3, C2, C3) as plain text. We then read the file using the following PowerShell code.

$filename = 'foo.xlsx'
$fullPath = $ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($filename)
$builder = New-Object -TypeName System.Data.OleDb.OleDbConnectionStringBuilder
$builder.Add('Provider', 'Microsoft.ACE.OLEDB.12.0')
$builder.Add('Data Source', $fullPath)
$builder.Add('Extended Properties', 'Excel 12.0 Xml;HDR=No;IMEX=1')
$conn = New-Object -TypeName System.Data.OleDb.OleDbConnection -ArgumentList $builder.ConnectionString
$conn.Open()
try {
    $command = New-Object -TypeName System.Data.OleDb.OleDbCommand -ArgumentList 'SELECT * FROM [Sheet1$]'
    $command.Connection = $conn
    $adapter = New-Object -TypeName System.Data.OleDb.OleDbDataAdapter -ArgumentList $command
    $sheet = New-Object -TypeName System.Data.DataTable
    [void]$adapter.Fill($sheet)
} finally {
    $conn.Close()
}
$sheet

This generates the following output, where both column A and row 1 appear to have been excluded as they were empty.

F1 F2
-- --
B2 C2
B3 C3

If we now modify cell A1, for example by adding a simple apostrophe (') to make it contain the empty string, the output of the script now changes to the following.

F1 F2 F3
-- -- --

   B2 C2
   B3 C3

This remains the case even if cell A1 is cleared. However, we can remove either column A or row 1 (not necessarily both) and once again get the trimmed output.

This is frustrating as I need to reliably read from exact cells. Is there a way to either avoid skipping empty rows and columns, or to find out how many of each have been skipped?

Excel
Excel
A family of Microsoft spreadsheet software with tools for analyzing, charting, and communicating data.
1,480 questions
PowerShell
PowerShell
A family of Microsoft task automation and configuration management frameworks consisting of a command-line shell and associated scripting language.
2,087 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Limitless Technology 43,956 Reputation points
    2023-05-22T10:59:54.52+00:00

    Hi @Anders Sjöqvist,

    I'd be happy to help you out with your question. Sorry for the inconvenience caused.

    The behavior you're experiencing is due to the default settings of the OLE DB provider, which skips empty rows and columns during data retrieval. However, there is a way to include these empty rows and columns in your query output.

    To accomplish this, you need to modify the connection string's Extended Properties. Follow the steps below:

    1. Open your PowerShell script that performs the OLE DB query on the Excel file.
    2. Locate the section where the connection string is defined. You should see a line similar to the following:

    $builder.Add('Extended Properties', 'Excel 12.0 Xml;HDR=No;IMEX=1')

    1. Modify the connection string by changing IMEX=1 to IMEX=0. This change forces the provider to include empty cells.

    $builder.Add('Extended Properties', 'Excel 12.0 Xml;HDR=No;IMEX=0')

    1. Save the script with the modifications.

    After making these changes, re-run the PowerShell script. The query output will now include all rows and columns, even if they only contain empty cells. This ensures that you can reliably read from the exact cells you need.

    Here's the updated code with the modified connection string:

    $filename = 'foo.xlsx'
    $fullPath = $ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($filename)
    $builder = New-Object -TypeName System.Data.OleDb.OleDbConnectionStringBuilder
    $builder.Add('Provider', 'Microsoft.ACE.OLEDB.12.0')
    $builder.Add('Data Source', $fullPath)
    $builder.Add('Extended Properties', 'Excel 12.0 Xml;HDR=No;IMEX=0')  # Modified: IMEX=0
    $conn = New-Object -TypeName System.Data.OleDb.OleDbConnection -ArgumentList $builder.ConnectionString
    $conn.Open()
    try {
        $command = New-Object -TypeName System.Data.OleDb.OleDbCommand -ArgumentList 'SELECT * FROM [Sheet1$]'
        $command.Connection = $conn
        $adapter = New-Object -TypeName System.Data.OleDb.OleDbDataAdapter -ArgumentList $command
        $sheet = New-Object -TypeName System.Data.DataTable
        [void]$adapter.Fill($sheet)
    } finally {
        $conn.Close()
    }
    $sheet
    

    Please note that using IMEX=0 may have some limitations, particularly when dealing with mixed data types in a column. If you encounter any issues with data type mismatches, you may need to specify specific data types for the columns in your query or adjust the IMEX value accordingly to handle the data appropriately.

    If you have any other questions or need assistance with anything, please don't hesitate to let me know. I'm here to help.

    If the reply was helpful, please don’t forget to upvote or accept as answer, thank you.