Hi Anonymous,
Thank you for using Microsoft Q&A forum and thank you for posting your query.
In order to connect to an on-premise Impala system from Azure Databricks using Python/PySpark code, you will need to use the pyodbc
library and a JDBC connection. You can install the pyodbc
library using the command !pip install pyodbc
in a Databricks notebook cell.
You will also need to have the Impala JDBC driver
installed on the Databricks cluster. You must download the Impala JDBC driver from the Cloudera website and upload it to your Databricks cluster. Once you have the JDBC driver installed, you can use the pyspark with spark.read.jdbc()
method to connect to the Impala server.
Below is an example/sample of how you can use the spark.read.jdbc()
method to connect to an Impala server and read data from a table:
Note: Please make sure to replace the placeholders <hostname>, <port>, <database>, <username>, <password>, <table_name> with your on-premise impala system details.
import pyodbc
from pyspark.sql import SparkSession
# Create a SparkSession
spark = SparkSession.builder.appName("Impala connection").getOrCreate()
# Create connection properties
connection_url = "jdbc:impala://<hostname>:<port>/<database>"
properties = {
"user": "<username>",
"password": "<password>",
"driver": "com.cloudera.impala.jdbc41.Driver"
}
# Read data from Impala table
dataframe = spark.read.jdbc(url=connection_url, table="<table_name>", properties=properties)
# Show dataframe
dataframe.show()
Important Note: Unless your database is accessible to the internet it will be unable to connect. You may need to vNet attach your databricks workspace to a vNet that has VPN or ExpressRoute connectivity to your onprem site (and correct routing in place). For more information, please refer to this Databricks documentation: Connect your Azure Databricks workspace to your on-premises network
Hope this info helps.
Thank you
Please don’t forget to Accept Answer
and Up-Vote
wherever the information provided helps you, this can be beneficial to other community members.