Hi Prasad Sandu
To connect your on-premises Hadoop Hive to Azure Databricks, you can follow one of these approaches depending on your setup and security constraints:
Option 1: Connect via JDBC (recommended for read access)
Install Hive JDBC Driver on your Databricks cluster:
- Upload the Hive JDBC JAR (e.g.,
hive-jdbc-uber.jar
) to DBFS or install it using%sh
or cluster libraries. - Example install using DBFS:
%sh wget https://repo1.maven.org/maven2/org/apache/hive/hive-jdbc/3.1.2/hive-jdbc-3.1.2-standalone.jar -P /databricks/jars/
Establish JDBC Connection: Use the following Spark read command in a notebook:
jdbc_url = "jdbc:hive2://<onprem-host>:10000/default"
df = spark.read.format("jdbc") \
.option("url", jdbc_url) \
.option("driver", "org.apache.hive.jdbc.HiveDriver") \
.option("dbtable", "<your_table>") \ .load()
df.show()
Network Requirements:
- Ensure that Azure Databricks can reach your on-prem Hive server.
- Typically done using Azure VPN Gateway or ExpressRoute, and a self-hosted integration runtime or private endpoint setup.
Option 2: Access via Azure Data Factory + SHIR
Use ADF with a Self-Hosted Integration Runtime (SHIR) on your on-prem environment to:
- Ingest data from Hive to Azure (e.g., Blob or ADLS).
- Then load into Databricks for processing.
Option 3: Use DistCp or Hive export
If JDBC is not ideal, you can:
- Export Hive data to HDFS/CSV/parquet.
- Use DistCp or AzCopy to move it to Azure Data Lake Storage.
- Load it into Databricks via Delta Lake or Spark read.
Notes:
- JDBC is best for small to medium data pulls or live lookups.
- For large-scale data, it's more efficient to stage in storage (ADLS) and process via Databricks.
For more information you can refer the below articles:
- https://learn.microsoft.com/en-us/azure/databricks/security/network/classic/on-prem-network
- https://learn.microsoft.com/en-us/azure/databricks/security/network/classic/vnet-inject
Hope this helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.