Driver capability settings for the Databricks JDBC Driver

Άρθρο
08/27/2024

This article describes how to configure special and advanced driver capability settings for the Databricks JDBC Driver.

The Databricks JDBC Driver provides the following special and advanced driver capability settings.

ANSI SQL-92 query support in JDBC
Default catalog and schema
Extract large query results in JDBC
Arrow serialization in JDBC
Cloud Fetch in JDBC
Enable logging

ANSI SQL-92 query support in JDBC

Legacy Spark JDBC drivers accept SQL queries in ANSI SQL-92 dialect and translate the queries to the Databricks SQL dialect before sending them to the server. However, if your application generates Databricks SQL directly or your application uses any non-ANSI SQL-92 standard SQL syntax specific to Azure Databricks, Databricks recommends that you set UseNativeQuery=1 as a connection configuration. With that setting, the driver passes the SQL queries verbatim to Azure Databricks.

Default catalog and schema

To specify the default catalog and schema, add ConnCatalog=<catalog-name>;ConnSchema=<schema-name> to the JDBC connection URL.

Extract large query results in JDBC

To achieve the best performance when you extract large query results, use the latest version of the JDBC driver, which includes the following optimizations.

Arrow serialization in JDBC

JDBC driver version 2.6.16 and above supports an optimized query results serialization format that uses Apache Arrow.

Cloud Fetch in JDBC

The JDBC driver version 2.6.19 and above supports Cloud Fetch, a capability that fetches query results through the cloud storage that is set up in your Azure Databricks deployment.

Query results are uploaded to an internal DBFS storage location as Arrow-serialized files of up to 20 MB. When the driver sends fetch requests after query completion, Azure Databricks generates and returns shared access signatures to the uploaded files. The JDBC driver then uses the URLs to download the results directly from DBFS.

Cloud Fetch is only used for query results larger than 1 MB. Smaller results are retrieved directly from Azure Databricks.

Azure Databricks automatically garbage collects the accumulated files which are marked for deletion after 24 hours. These marked files are completely deleted after an additional 24 hours.

To learn more about the Cloud Fetch architecture, see How We Achieved High-bandwidth Connectivity With BI Tools.

Enable logging

To enable logging in the JDBC driver, set the LogLevel property from 1 to log only severe events through 6 to log all driver activity. Set the LogPath property to the full path to the folder where you want to save log files.

For more information, see the Configuring Logging section in the Databricks JDBC Driver Guide.

Κοινή χρήση μέσω