Azure Databricks provides a Snowflake connector in the Databricks Runtime to support reading and writing data from Snowflake.
Important
The configurations described in this article are Experimental. Experimental features are provided as-is and are not supported by Databricks through customer technical support. To get full query federation support, you should instead use Lakehouse Federation, which enables your Azure Databricks users to take advantage of Unity Catalog syntax and data governance tools.
Query a Snowflake table in Azure Databricks
You can configure a connection to Snowflake and then query data. Before you begin, check which version of Databricks Runtime your cluster runs on. The following code provides example syntax in Python, SQL, and Scala.
Python
# The following example applies to Databricks Runtime 11.3 LTS and above.
snowflake_table = (spark.read
.format("snowflake")
.option("host", "hostname")
.option("port", "port") # Optional - will use default port 443 if not specified.
.option("user", "username")
.option("password", "password")
.option("sfWarehouse", "warehouse_name")
.option("database", "database_name")
.option("schema", "schema_name") # Optional - will use default schema "public" if not specified.
.option("dbtable", "table_name")
.load()
)
# The following example applies to Databricks Runtime 10.4 and below.
snowflake_table = (spark.read
.format("snowflake")
.option("dbtable", table_name)
.option("sfUrl", database_host_url)
.option("sfUser", username)
.option("sfPassword", password)
.option("sfDatabase", database_name)
.option("sfSchema", schema_name)
.option("sfWarehouse", warehouse_name)
.load()
)
SQL
/* The following example applies to Databricks Runtime 11.3 LTS and above. */
DROP TABLE IF EXISTS snowflake_table;
CREATE TABLE snowflake_table
USING snowflake
OPTIONS (
host '<hostname>',
port '<port>', /* Optional - will use default port 443 if not specified. */
user '<username>',
password '<password>',
sfWarehouse '<warehouse_name>',
database '<database-name>',
schema '<schema-name>', /* Optional - will use default schema "public" if not specified. */
dbtable '<table-name>'
);
SELECT * FROM snowflake_table;
/* The following example applies to Databricks Runtime 10.4 LTS and below. */
DROP TABLE IF EXISTS snowflake_table;
CREATE TABLE snowflake_table
USING snowflake
OPTIONS (
dbtable '<table-name>',
sfUrl '<database-host-url>',
sfUser '<username>',
sfPassword '<password>',
sfDatabase '<database-name>',
sfSchema '<schema-name>',
sfWarehouse '<warehouse-name>'
);
SELECT * FROM snowflake_table;
Scala
# The following example applies to Databricks Runtime 11.3 LTS and above.
val snowflake_table = spark.read
.format("snowflake")
.option("host", "hostname")
.option("port", "port") /* Optional - will use default port 443 if not specified. */
.option("user", "username")
.option("password", "password")
.option("sfWarehouse", "warehouse_name")
.option("database", "database_name")
.option("schema", "schema_name") /* Optional - will use default schema "public" if not specified. */
.option("dbtable", "table_name")
.load()
# The following example applies to Databricks Runtime 10.4 and below.
val snowflake_table = spark.read
.format("snowflake")
.option("dbtable", table_name)
.option("sfUrl", database_host_url)
.option("sfUser", username)
.option("sfPassword", password)
.option("sfDatabase", database_name)
.option("sfSchema", schema_name)
.option("sfWarehouse", warehouse_name)
.load()
Notebook example: Snowflake Connector for Spark
The following notebooks provide simple examples of how to write data to and read data from Snowflake. See Snowflake Connector for Spark for more details.
Tip
Avoid exposing your Snowflake username and password in notebooks by using Secrets, which are demonstrated in the notebooks.
Notebook example: Save model training results to Snowflake
The following notebook walks through best practices for using the Snowflake Connector for Spark. It writes data to Snowflake, uses Snowflake for some basic data manipulation, trains a machine learning model in Azure Databricks, and writes the results back to Snowflake.
Why don’t my Spark DataFrame columns appear in the same order in Snowflake?
The Snowflake Connector for Spark doesn’t respect the order of the columns in the table being written to; you must explicitly specify the mapping between DataFrame and Snowflake columns. To specify this mapping, use the columnmap parameter.
Why is INTEGER data written to Snowflake read back as DECIMAL?
Snowflake represents all INTEGER types as NUMBER, which can cause a change in data type when you write data to and read data from Snowflake. For example, INTEGER data can be converted to DECIMAL when writing to Snowflake, because INTEGER and DECIMAL are semantically equivalent in Snowflake (see Snowflake Numeric Data Types).
Why are the fields in my Snowflake table schema always uppercase?
Snowflake uses uppercase fields by default, which means that the table schema is converted to uppercase.
Tables in a Microsoft Fabric lakehouse are based on the Delta Lake technology commonly used in Apache Spark. By using the enhanced capabilities of delta tables, you can create advanced analytics solutions.
Demonstrate understanding of common data engineering tasks to implement and manage data engineering workloads on Microsoft Azure, using a number of Azure services.