다음을 통해 공유


Microsoft/Azure Databases: Writing a BCP API based ODBC Application

Introduction

For mass data ingestion, various database vendors provide different native tool/utility/APIs for application developers (or even end users) to choose from. For example, Oracle support Direct Path Load (DPL) APIs which can be written in their OCI applications. MySQL supports mysqldump utility to import/export the data to/from the MySQL database. Likewise, Microsoft provides Bulk Copy Program technology which is available in (at least) two forms:

1. bcp Utility – command line utility for the end user consumption [1]

2. bcp APIs – extension to Microsoft SQL Server ODBC driver [2]

This article focuses on #2 form of BCP; i.e. using extension functions to Microsoft SQL Server ODBC driver.

Pre-requisites

  • - Microsoft ODBC driver is supported on two major platforms: Windows (32 & 64-bit) and Linux (64-bit). Make sure to have either of these supported platforms to begin with.
  • - Microsoft ODBC driver 13.1 (or any available) for SQL Server [3]
  • - Connection access to any of Microsoft database: SQL Server, Azure SQL Database, Azure SQL Data Warehouse, Parallel Data Warehouse
  • - C++ compiler: On Windows – Visual Studio 2013 (or any available), On Linux – g++ compiler

Setting up solution/project in Visual Studio 2013

On Windows platform, Visual Studio can be used to create a new BCP project. Note that project can be created in any available Visual Studio; however, author has Visual Studio 2013 installed and it will be used throughout this article.

Creating a new solution

After opening Visual Studio 2013, choose “New Project …” option and fill in necessary details (example shown in the below screenshot), complete rest of the wizard with default values.

Initial screen with new project

Once a successful project is created (“MicrosoftBCP”), Visual Studio will show screen something like this:

Setting compiler/linker options

Open “Properties” option after right clicking project from the “Solution Explorer”:

If “Property page” dialog box is not showing “Platform” as x64 (for 64-bit compilation), click on “Configuration Manager”:

Choose <New…> from “Active solution platform:” drop down menu:

Select “x64” from “Type or select the new platform:” drop down menu:

Click “OK” and close “Configuration Manager…” dialog box.

Inside “Property Pages” dialog box, choose “Configuration Properties” à “VC++ Directories” and under “Include Directories” property, prefix include path of the “Microsoft ODBC driver for SQL Server” installation. Typically, this location is C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\130\SDK\Include:

Similarly, under “Linker” à “General”’s “Additional Library Directories” property need to have Lib/x64 path of the “Microsoft ODBC driver for SQL Server” installation. Typically, this location is C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\130\SDK\Lib\x64:

Also, under “Linker” à “Input”’s “Additional Dependencies” property, add “Microsoft ODBC driver for SQL Server”s lib file viz. msodbcsql13.lib:

Now try building the solution by first ensuring “Debug” for platform “x64” is selected as highlighted in the screenshot below. Right click on the Solution inside a “Solution Explorer” and choose “Build Solution” or press F7:

Make sure this build is successful in the “Output” window of Visual Studio 2013 and our dummy (for now; as we’ve not added any logical code in the source!) executable is generated:

In case any errors reported, review the steps again and fix those before proceed to the next section.

High-level steps for using BCP APIs

Make connection to the database (ODBC-way):

  • - SQLAllocHandle(SQL_HANDLE_ENV,…)
  • - SQLAllocHandle(SQL_HANDLE_DBC,…)
  • - SQLSetConnectAttr(hdbc, SQL_COPT_SS_BCP, (void *)SQL_BCP_ON,…)
  • - SQLDriverConnect(hdbc, NULL, sConnectString,…)

Initializing BCP environment:

  • - bcp_init(hdbc, "sampleTable",…)

Binding program variables for each columns:

  • - bcp_bind(hdbc, (LPCBYTE)&idCompany, 0, sizeof(DBINT),…)
  • - bcp_bind(hdbc, (LPCBYTE)name, 0, SQL_VARLEN_DATA, …)

Sending data from application to BCP client:

  • - bcp_sendrow(hdbc)

Commit/flush data from BCP client to the database/server:

  • - bcp_batch(hdbc)

Close BCP once all data is transferred:

  • - bcp_done(hdbc)

Disconnect from the database (ODBC-way):

  • - SQLDisconnect(hdbc)
  • - SQLFreeHandle(SQL_HANDLE_DBC,…)
  • - SQLFreeHandle(SQL_HANDLE_ENV,…)

Complete BCP ODBC program

/* MicrosoftBCP.cpp : Defines the entry point for the console application.

 * Author: Rahul Dhuvad (rdhuvad@informatica.com). Date: 02-Aug-2017

 *

 * Make sure to create "employeeTable" table in the target database with

 * below schema before running this application:

 * CREATE TABLE employeeTable (

 * empId INTEGER,

 * empName VARCHAR(50),

 * d esignation VARCHAR(15)

 * );

 *

 * For any failures from either ODBC or BCP APIs, driver provides detailed

 * error information by calling SQLGetDiagRec(). To avoid cluttering too much

 * in the demonstration program, error details part is skipped. However, in

 * production ready, one need to ensure to get appropriate diagnostic details.

 */

 

#ifdef _WIN32

#include "stdafx.h"

#include <windows.h>

#else // _WIN32

#include <stdio.h>

#include <stdlib.h>

#endif // _WIN32

 

/* Include header files for ODBC/BCP API usage */

#include <sqlext.h>

#include <msodbcsql.h>

 

/*

 * Check the return status code of last ODBC API and if it's unsuccessful

 * then call SQLGetDiagRec() in a loop to extract all available error

 * codes and messages for further diagnosis.

 */

static int checkRC(RETCODE retCode, SQLHANDLE henv, SQLHANDLE hdbc, SQLHANDLE hstmt) {

RETCODE rc = SQL_SUCCESS;

SQLSMALLINT handleType = (hstmt != NULL ? SQL_HANDLE_STMT : (hdbc != NULL ? SQL_HANDLE_DBC : SQL_HANDLE_ENV));

SQLHANDLE handle = (hstmt != NULL ? hstmt : (hdbc != NULL ? hdbc : henv));

SQLCHAR sqlState[6], msg[SQL_MAX_MESSAGE_LENGTH];

SQLINTEGER nativeError;

SQLSMALLINT recNum = 1, msgLen;

 

if (retCode == SQL_SUCCESS) { return rc; }

 

if (handle == NULL) { rc = SQL_ERROR; return rc; }

 

/* Loop thru to get all error messages for last unsuccessful SQL execution */

while ((rc = SQLGetDiagRec(handleType, handle, recNum, sqlState, &nativeError, msg, sizeof(msg), &msgLen)) != SQL_NO_DATA) {

printf("SQLGetDiagRec: NativeError: %d, SQLSTATE: %s, errorMsg: %s\n", nativeError, sqlState, msg);

recNum++;

}

 

return retCode;

}

 

#ifdef _WIN32

int _tmain(int argc, _TCHAR* argv[])

#else // _WIN32

int main(int argc, char* argv[])

#endif // _WIN32

{

/* Return code from various ODBC/BCP APIs */

RETCODE rc = SQL_SUCCESS;

 

/* ODBC handles for environment & database */

HDBC henv = NULL, hdbc = NULL;

 

/* Application variable buffers holding data to be loaded to the table */

DBINT vEmpId;

SQLCHAR vEmpName[50 + 1];

SQLCHAR vDesignation[15 + 1];

/* Return statistic from BCP on number of rows loaded since last bcp_batch() */

DBINT nRowsProcessed, nTotalRowsProcessed = 0;

 

/* Allocate ODBC environment handle */

rc = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &henv);

if (rc != SQL_SUCCESS) {

printf("Status: SQLAllocHandle(SQL_HANDLE_ENV) failed\n");

exit(-1);

}

 

/* Set ODBC Version to V3 */

rc = SQLSetEnvAttr(henv, SQL_ATTR_ODBC_VERSION, (SQLPOINTER*)SQL_OV_ODBC3, 0);

if (rc != SQL_SUCCESS) {

printf("Status: SQLSetEnvAttr(SQL_ATTR_ODBC_VERSION) failed\n");

exit(-1);

}

 

/* Allocate ODBC database handle */

rc = SQLAllocHandle(SQL_HANDLE_DBC, henv, &hdbc);

if (rc != SQL_SUCCESS) {

printf("Status: SQLAllocHandle(SQL_HANDLE_DBC) failed\n");

exit(-1);

}

 

/* Enable bulk copy mode, prior to connecting to the database */

rc = SQLSetConnectAttr(hdbc, SQL_COPT_SS_BCP, (SQLPOINTER)SQL_BCP_ON, SQL_IS_INTEGER);

if (rc != SQL_SUCCESS) {

printf("Status: SQLSetConnectAttr(SQL_COPT_SS_BCP) failed\n");

exit(-1);

}

 

/* Connect to the database */

rc = SQLConnect(hdbc, (SQLCHAR *)"DSN_SQLServer", SQL_NTS, (SQLCHAR *)"adapter", SQL_NTS, (SQLCHAR *)"adapter", SQL_NTS);

if (rc == SQL_ERROR) {

checkRC(rc, henv, hdbc, NULL);

printf("Status: SQLConnect() failed\n");

exit(-1);

}

 

/* Initialize BCP by providing target table name on which data is to be loaded */

rc = bcp_init(hdbc, "employeeTable", NULL, NULL, DB_IN); // "employeeTable" - table name

if (rc == FAIL) { // note that bcp_init() returns FAIL(0) for an error!

printf("Status: bcp_init() failed\n");

exit(-1);

}

 

/* Bind program variables to each of the table columns */

rc = bcp_bind(hdbc, (LPCBYTE)&vEmpId, 0, sizeof(DBINT), NULL, 0, SQLINT4, 1); // column-1

if (rc == FAIL) {

printf("Status: bcp_bind(1) failed\n");

exit(-1);

}

 

rc = bcp_bind(hdbc, (LPCBYTE)vEmpName, 0, SQL_VARLEN_DATA, (LPCBYTE)"", sizeof(SQLCHAR), SQLCHARACTER, 2); // column-2

if (rc == FAIL) {

printf("Status: bcp_bind(2) failed\n");

exit(-1);

}

 

rc = bcp_bind(hdbc, (LPCBYTE)vDesignation, 0, SQL_VARLEN_DATA, (LPCBYTE)"", sizeof(SQLCHAR), SQLCHARACTER, 3); // column-3

if (rc == FAIL) {

printf("Status: bcp_bind(3) failed\n");

exit(-1);

}

 

/* Loop thru 1000 times to send/load that many rows to the table */

for (int i = 0; i < 1000; i++) {

 

/* Set/Populate data to the program variables which needs to be loaded to the table.

* For this example, we are demonstrating having some dummy values to the data; in

* reality, proper data needs to be ensured! */

vEmpId = i;

#ifdef _WIN32

sprintf_s((char *)vEmpName, sizeof(vEmpName), "Name_%d", i);

sprintf_s((char *)vDesignation, sizeof(vDesignation), "Designation_%d", i);

#else // _WIN32

sprintf((char *)vEmpName, "Name_%d", i);

sprintf((char *)vDesignation, "Designation_%d", i);

#endif // _WIN32

 

/* Send the variable data from application to BCP client (note that server will still not receive it here!) */

rc = bcp_sendrow(hdbc);

if (rc == FAIL) {

printf("Status: bcp_sendrow() failed\n");

exit(-1);

}

 

/* At the interval of 100, commit the current batch by flushing the batch from BCP client to the database */

if ((i % 100) == 0) {

nRowsProcessed = bcp_batch(hdbc);

if (nRowsProcessed == -1) {

printf("Status: bcp_batch() failed to flush the data!\n");

exit(-1);

}

printf("Status: %ld rows flushed and committed!\n", nRowsProcessed);

nTotalRowsProcessed += nRowsProcessed;

}

 

}

 

/* Finally terminate the bulk copy operation; commit/flush any remaining rows */

nRowsProcessed = bcp_done(hdbc);

if (nRowsProcessed == -1) {

 printf("Status: bcp_done() failed to flush the remaining data!\n");

exit(-1);

}

nTotalRowsProcessed += nRowsProcessed;

printf("Status: Final %ld rows flushed and committed (total %ld rows loaded overall)!\n", nRowsProcessed, nTotalRowsProcessed);

 

/* Disconnect from the database */

rc = SQLDisconnect(hdbc);

if (rc != SQL_SUCCESS) {

printf("Status: SQLDisconnect() failed\n");

exit(-1);

}

 

/* Deallocate ODBC database handle */

rc = SQLFreeHandle(SQL_HANDLE_DBC, hdbc);

if (rc != SQL_SUCCESS) {

printf("Status: SQLFreeHandle(SQL_HANDLE_DBC) failed\n");

exit(-1);

}

 

/* Deallocate ODBC environment handle */

rc = SQLFreeHandle(SQL_HANDLE_ENV, henv);

if (rc != SQL_SUCCESS) {

printf("Status: SQLFreeHandle(SQL_HANDLE_ENV) failed\n");

exit(-1);

}

 

exit:

return 0;

}

Creating ODBC DSN (Windows)

Before above BCP application is executed, make sure to create ODBC DSN to be used in the application. This section briefly explains how to create such ODBC DSN on Windows platform (64-bit).

Open ODBC Administrator (C:\Windows\System32\odbcad32.exe), open “System DSN” tab, and click on “Add…” button in it:

Choose “ODBC Driver 13 for SQL Server”:

Fill in the database information in the “”Create a New Data Source to SQL Server” dialog box:

Depending on what authentication mechanism is supported or to be preferred, fill-in next wizard dialog accordingly, one possible sample shown below:

In case non-default database is being used, change next wizard dialog appropriately; like the one shown below:

Finally ensure that Test connection is successful; in case it’s not successful, please resolve the issue before proceeding further:

 

 

Executing ODBC application on Windows

In case ODBC application needs to be compiled from the command line, below steps will be useful:

C:\Users\rdhuvad\Documents\Visual Studio 2013\Projects\MicrosoftBCP\MicrosoftBCP>"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_amd64\vcvarsx86_amd64.bat"

 

C:\Users\rdhuvad\Documents\Visual Studio 2013\Projects\MicrosoftBCP\MicrosoftBCP>"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_amd64\CL.exe" /c /Zi /nologo /W1 /WX- /Od /Gm- /EHsc /MD /GS /fp:precise /Zc:wchar_t /Zc:forScope /Fo"x64\Debug\\" /Fd"x64\Debug\vc120.pdb" /I"C:\Program Files (x86)\Microsoft SQL Server\Client SDK\ODBC\130\SDK\Include" /Gd /TP /errorReport:prompt MicrosoftBCP.cpp stdafx.cpp

MicrosoftBCP.cpp

stdafx.cpp

Generating Code...

 

C:\Users\rdhuvad\Documents\Visual Studio 2013\Projects\MicrosoftBCP\MicrosoftBCP>"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_amd64\link.exe" /ERRORREPORT:PROMPT /OUT:"C:\Users\rdhuvad\documents\visual studio 2013\Projects\MicrosoftBCP\x64\Debug\MicrosoftBCP.exe" /NOLOGO /LIBPATH:"C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\130\SDK\Lib\x64" msodbcsql13.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /MANIFEST /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /manifest:embed /DEBUG /PDB:"C:\Users\rdhuvad\documents\visual studio 2013\Projects\MicrosoftBCP\x64\Debug\MicrosoftBCP.pdb" /TLBID:1 /DYNAMICBASE /NXCOMPAT /IMPLIB:"C:\Users\rdhuvad\documents\visual studio 2013\Projects\MicrosoftBCP\x64\Debug\MicrosoftBCP.lib" /MACHINE:X64 x64\Debug\MicrosoftBCP.obj

C:\Users\rdhuvad\Documents\Visual Studio 2013\Projects\MicrosoftBCP\MicrosoftBCP>"C:\Users\rdhuvad\documents\visual studio 2013\Projects\MicrosoftBCP\x64\Debug\MicrosoftBCP.exe"

Status: 1 rows flushed and committed!

Status: 100 rows flushed and committed!

Status: 100 rows flushed and committed!

Status: 100 rows flushed and committed!

...

Status: 100 rows flushed and committed!

Status: Final 99 rows flushed and committed (total 1000 rows loaded overall)!

 

Working on Linux platform

Downloading Microsoft ODBC driver for SQL Server

Download PDF file Linux_Install_Instructions.pdf from [1] or [4]. Depending on the flavor of Linux being used, follow the instructions accordingly. For example, author had RedHat-6 and followed below steps provided in this PDF or at [4]:

sudo su

curl https://packages.microsoft.com/config/rhel/6/prod.repo > /etc/yum.repos.d/mssql-release.repo

exit

sudo yum remove unixODBC-utf16 unixODBC-utf16-devel #to avoid conflicts

sudo ACCEPT_EULA=Y yum install msodbcsql-13.1.4.0-1

# optional: for bcp and sqlcmd

sudo ACCEPT_EULA=Y yum install mssql-tools

echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bash_profile

echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc

source ~/.bashrc

# optional: for unixODBC development headers

sudo yum install unixODBC-devel

Creating libmsodbcsql-13.1.so

Unfortunately, it seems installation of Microsoft ODBC driver for SQL Server missed creating soft-link from its run-time library to generic .so. Due to this, linking phase of g++ fails as it is unable to find symbols for bcp APIs. To fix this, author has manually created the soft link in the install directory of this driver:

[rdhuvad@myhost lib64]$ pwd

/opt/microsoft/msodbcsql/lib64

 

[rdhuvad@myhost lib64]$ ls -l

total 16100

-rwxr-xr-x. 1 root root 16483820 Jan 25 2017 libmsodbcsql-13.1.so.4.0

[rdhuvad@myhost lib64]$ sudo ln -s libmsodbcsql-13.1.so.4.0 libmsodbcsql-13.1.so

 

[rdhuvad@myhost lib64]$ ls -l

total 16100

lrwxrwxrwx. 1 root root 24 Aug 2 10:31 libmsodbcsql-13.1.so -> libmsodbcsql-13.1.so.4.0

-rwxr-xr-x. 1 root root 16483820 Jan 25 2017 libmsodbcsql-13.1.so.4.0

 

Fixing missing “SQL Server Data Type defines”

It appears that msodbcsql.h shipped for Linux does not have any SQL Server Data Type defines in it? Due to which all binding calls using bcp_bind() fails. As a workaround, author has copied those from msodbcsql.h available on Windows to the Linux’s version. For example, author has added highlighted code in Linux’s msodbcsql.h file (from Window’s equivalent file):

Compiling/Linking ODBC application on Linux

By minimal, one would need to compile using UnixODBC and MSODBC header files; and link using their libraries. If ODBC application uses any other third-party APIs, compilation step needs to appropriately take care of the same.

Below is how author’s makefile on Linux looks like (note that -g is for debug symbol generation; should be removed from final production ready application):

MicrosoftBCP: MicrosoftBCP.cpp

g++ -g -o MicrosoftBCP -I/usr/include -I/opt/microsoft/msodbcsql/include -L/usr/lib64 -L/opt/microsoft/msodbcsql/lib64 -lodbc -lmsodbcsql-13.1 MicrosoftBCP.cpp

Creating ODBC entries on Linux

During installation of Microsoft ODBC driver for SQL Server on Linux, by default, two files will be created under /etc directory:

odbcinst.ini & odbc.ini

Installation has already taken care of updating odbcinst.ini with following details:

[ODBC Driver 13 for SQL Server]

Description=Microsoft ODBC Driver 13 for SQL Server

Driver=/opt/microsoft/msodbcsql/lib64/libmsodbcsql-13.1.so.4.0

UsageCount=1

 

To recognize the DSN that’s being used in the ODBC application, odbc.ini file needs to be updated like this:

[DSN_SQLServer]

Driver=ODBC Driver 13 for SQL Server

Database=sqlServerDatabaseName

Server=sqlServerHostOrIp[,sqlServerDatabasePort]

 

Before proceeding with executing the ODBC application, UnixODBC provides easy way to test the DSN connection created using “isql” command line utility:

 

[rdhuvad@myhost etc]$ isql DSN_SQLServer sqlServerUser sqlServerPassword

+---------------------------------------+

| Connected! |

|  |

| sql-statement |

| help [tablename]  |

| quit  |

|  |

+---------------------------------------+

 

Executing ODBC application on Linux

Make sure LD_LIBRARY_PATH is set to point to MS ODBC’s and unixODBC library:

export LD_LIBRARY_PATH=/opt/microsoft/msodbcsql/lib64:/usr/lib64:$LD_LIBRARY_PATH

 

Using “ldd”, one can make sure no other missing run-time library exists on the executable:

[rdhuvad@myhost sqlserver]$ ldd MicrosoftBCP

linux-vdso.so.1 =>  (0x00007fff58cab000)

libodbc.so.2 => /usr/lib64/libodbc.so.2 (0x0000003cc9000000)

libmsodbcsql-13.1.so => /opt/microsoft/msodbcsql/lib64/libmsodbcsql-13.1.so (0x00007f2e8ead5000)

libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000003443e00000)

libm.so.6 => /lib64/libm.so.6 (0x0000003442200000)

libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003444a00000)

libc.so.6 => /lib64/libc.so.6 (0x0000003441200000)

libltdl.so.7 => /usr/lib64/libltdl.so.7 (0x00000036bfc00000)

...

Finally, application can be executed:

[rdhuvad@myhost sqlserver]$ ./MicrosoftBCP

Status: 1 rows flushed and committed!

Status: 100 rows flushed and committed!

Status: 100 rows flushed and committed!

Status: 100 rows flushed and committed!

...

Status: 100 rows flushed and committed!

Status: Final 99 rows flushed and committed (total 1000 rows loaded overall)!

Performance using BCP API

BCP provides significant performance improvement over regular ODBC row-by-row or even array insert operations. However, be aware that for a smaller batch size (bcp_batch() calls), BCP performance degrades; so it’s advisable to have better batch size.

Limitations using BCP API

  • BCP API expects to load all columns of the table; that means one cannot load partial list of columns. As a workaround load remaining columns with null indicator value (SQL_NULL_DATA).
  • BCP performance degrades significantly for a smaller batch size (i.e. calling bcp_batch() too frequently); say less than 10. “What is the good batch size” depends on factors like row-size of the record, network latencies, etc. So be careful while making a batch size decision.
  • Unlike regular ODBC statement level support, BCP API works at database handle level only. That means one cannot have multiple BCP targets in a single database handle. Workaround is to either serialize such operations or use multiple database handles, one for each BCP target.
  • Be aware that few BCP API’s return codes are confusing and contradicts usual ODBC API return codes. For example, bcp_init(), bcp_bind(), bcp_sendrow() returns FAIL (defines to 0) for any unsuccessful execution of these API. In standard ODBC API failure is SQL_ERROR (which defines to -1). For other BCP APIs like bcp_batch(), bcp_done() returns -1 for any failures.

References

Back to Top