Use SQLAlchemy with Azure Databricks
Azure Databricks provides a SQLAlchemy dialect (the system SQLAlchemy uses to communicate with various types of database API implementations and databases) for Azure Databricks. SQLAlchemy is a Python SQL toolkit and Object Relational Mapper (ORM). SQLAlchemy provides a suite of well known enterprise-level persistence patterns, designed for efficient and high-performing database access, adapted into a simple and Pythonic domain language. See Features and Philosophy.
The SQLAlchemy dialect for Azure Databricks is included with the Databricks SQL Connector for Python. This article covers SQLAlchemy dialect for Azure Databricks version 2.0, which requires Databricks SQL Connector for Python version 3.0.0 or above.
Requirements
- A development machine running Python >=3.8 and <=3.11.
- Databricks recommends that you use Python virtual environments, such as those provided by venv that are included with Python. Virtual environments help to ensure that you are using the correct versions of Python and the Databricks SQL Connector for Python together. Setting up and using virtual environments is outside of the scope of this article. For more information, see Creating virtual environments.
- An existing cluster or SQL warehouse.
Get started
Install the Databricks SQL Connector for Python library version 3.0.0 or above on your development machine by running
pip install "databricks-sql-connector[sqlalchemy]"
orpython -m pip install "databricks-sql-connector[sqlalchemy]"
. For version information, see the databricks-sql-connector release history.Gather the following information for the cluster or SQL warehouse that you want to use:
Cluster
- The server hostname of the cluster. You can get this from the Server Hostname value in the Advanced Options > JDBC/ODBC tab for your cluster.
- The HTTP path of the cluster. You can get this from the HTTP Path value in the Advanced Options > JDBC/ODBC tab for your cluster.
SQL warehouse
- The server hostname of the SQL warehouse. You can get this from the Server Hostname value in the Connection Details tab for your SQL warehouse.
- The HTTP path of the SQL warehouse. You can get this from the HTTP Path value in the Connection Details tab for your SQL warehouse.
Authentication
The SQLAlchemy dialect for Azure Databricks supports Azure Databricks personal access token authentication.
To create an Azure Databricks personal access token, do the following:
- In your Azure Databricks workspace, click your Azure Databricks username in the top bar, and then select Settings from the drop down.
- Click Developer.
- Next to Access tokens, click Manage.
- Click Generate new token.
- (Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).
- Click Generate.
- Copy the displayed token to a secure location, and then click Done.
Note
Be sure to save the copied token in a secure location. Do not share your copied token with others. If you lose the copied token, you cannot regenerate that exact same token. Instead, you must repeat this procedure to create a new token. If you lose the copied token, or you believe that the token has been compromised, Databricks strongly recommends that you immediately delete that token from your workspace by clicking the trash can (Revoke) icon next to the token on the Access tokens page.
If you are not able to create or use tokens in your workspace, this might be because your workspace administrator has disabled tokens or has not given you permission to create or use tokens. See your workspace administrator or the following topics:
To authenticate the SQLAlchemy dialect, use the following code snippet. This snippet assumes that you have set the following environment variables:
DATABRICKS_TOKEN
, set to the Azure Databricks personal access token.DATABRICKS_SERVER_HOSTNAME
set to the Server Hostname value for your cluster or SQL warehouse.DATABRICKS_HTTP_PATH
, set to HTTP Path value for your cluster or SQL warehouse.DATABRICKS_CATALOG
, set to the target catalog in Unity Catalog.DATABRICKS_SCHEMA
, set to the target schema (also known as a database) in Unity Catalog.
To set environment variables, see your operating system’s documentation.
import os
from sqlalchemy import create_engine
access_token = os.getenv("DATABRICKS_TOKEN")
server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME")
http_path = os.getenv("DATABRICKS_HTTP_PATH")
catalog = os.getenv("DATABRICKS_CATALOG")
schema = os.getenv("DATABRICKS_SCHEMA")
engine = create_engine(
url = f"databricks://token:{access_token}@{server_hostname}?" +
f"http_path={http_path}&catalog={catalog}&schema={schema}"
)
# ...
You use the preceding engine
variable to connect to your specified catalog and schema through your Azure Databricks compute resource. For connection examples, see the following section and the sqlalchemy.py file in GitHub.
Example
See the sqlalchemy.py file in GitHub.
DBAPI reference
- See the README.sqlalchemy.md file in GitHub.
- See also the sqlalchemy source code directory in GitHub.