Driver capability settings for the Databricks ODBC Driver
This article describes how to configure special and advanced driver capability settings for the Databricks ODBC Driver.
The Databricks ODBC Driver provides the following special and advanced driver capability settings.
- Set the initial schema in ODBC
- ANSI SQL-92 query support in ODBC
- Extract large query results in ODBC
- Arrow serialization in ODBC
- Cloud Fetch in ODBC
- Enable logging
Set the initial schema in ODBC
The ODBC driver allows you to specify the schema by setting Schema=<schema-name>
as a connection configuration. This is equivalent to running USE <schema-name>
.
ANSI SQL-92 query support in ODBC
The ODBC driver accepts SQL queries in ANSI SQL-92 dialect and translates the queries to the Databricks SQL dialect. However, if your application generates Databricks SQL directly or your application uses any non-ANSI SQL-92 standard SQL syntax specific to Azure Databricks, Databricks recommends that you set UseNativeQuery=1
as a connection configuration. With that setting, the driver passes the SQL queries verbatim to Azure Databricks.
Extract large query results in ODBC
To achieve the best performance when you extract large query results, use the latest version of the ODBC driver that includes the following optimizations.
Arrow serialization in ODBC
ODBC driver version 2.6.15 and above supports an optimized query results serialization format that uses Apache Arrow.
Cloud Fetch in ODBC
ODBC Driver version 2.6.17 and above support Cloud Fetch, a capability that fetches query results through the cloud storage set up in your Azure Databricks deployment.
Query results are uploaded to an internal DBFS storage location as Arrow-serialized files of up to 20 MB. When the driver sends fetch requests after query completion, Azure Databricks generates and returns shared access signatures to the uploaded files. The ODBC driver then uses the URLs to download the results directly from DBFS.
Cloud Fetch is only used for query results larger than 1 MB. Smaller results are retrieved directly from Azure Databricks.
Azure Databricks automatically garbage collects the accumulated files which are marked for deletion after 24 hours. These marked files are completely deleted after an additional 24 hours.
To learn more about the Cloud Fetch architecture, see How We Achieved High-bandwidth Connectivity With BI Tools.
Enable logging
To enable logging in the ODBC driver for Windows, set the following fields in the ODBC Data Source Administrator for the related DSN:
- Set the Log Level field from FATAL to log only severe events through TRACE to log all driver activity.
- Set the Log Path field to the full path to the folder where you want to save log files.
- Set the Max Number Files field to the maximum number of log files to keep.
- Set the Max File Size field to the maximum size of each log file in megabytes.
To enable logging in the ODBC driver for a non-Windows machine, set the following properties in the related DSN or DSN-less connection string:
- Set the
LogLevel
property from1
to log only severe events through6
to log all driver activity. - Set the
LogPath
property to the full path to the folder where you want to save log files. - Set the
LogFileCount
property to the maximum number of log files to keep. - Set the
LogFileSize
property to the maximum size of each log file in bytes.
For more information, see the sections Configuring Logging Options on Windows
and Configuring Logging Options on a Non-Windows Machine
in the Databricks JDBC Driver Guide.