Can't connect to HBase instance from Azure Data Factory to use it as Source

mateusz.peczek 16 Reputation points
2023-06-21T14:15:08.3366667+00:00

I'm trying to connect to Apache HBase instance (self hosted, not HDInsight from Azure) and I'm out of ideas how to do that. I've already tested few different options.

I've created Azure VM and installed HBase instance there. I've opened required ports to enable communication from outside of this VM. All necessary requests are working calling them from my local PC.

rules

Approaches that I've taken are:

  1. Linked service based on built in HBase connector pointed to Thrift service.

I've set up a Thrift server on HBase instance hosted on Azure VM. I'm able to see REST control panel for this service from outside. Here's config window:

thrift

This returns error:
ERROR [HY000] [Microsoft][DriverSupport] (1110) Unexpected response received from server. Please ensure the server host and port specified for the connection are correct and confirm if SSL should be enabled for the connection.

  1. Linked service based on built in HBase connector pointed to REST API service.

rest

This configuration works from LinkedService point of view. After creating DataSet, all the tables are listed too, but no data can be retrieved.

data from rest

Error from Preview Data options is:

ERROR [HY000] [Microsoft][HBase] (40) Error with HTTP request, response code: 404

  1. Linked service based on ODBC driver and self hosted integration runtime.

As last resort option I've installed self hosted integration on my machine and CData ODBC Driver for Apache HBase. I'm able to retrieve all the tables and data by driver itself:

odbc driver

Both Self Hosted IR and ODBC linked service based on that IR works as expected, showing successful connection status

integration

In that case, Data Set cannot list any of the tables that exists in HBase cluster.

dataset

Error received after selecting Preview data:
ERROR [42000] Invalid URI: The hostname could not be parsed.

I've tried multiple ways of providing Host address in connection string, using http:// with IP address and IP address alone, for each case result is the same. I've tried to use known value "test" as table name using Edit option, but this one didn't work too.

I have simple C# application with Thrift client inside and it works without any issues on port 9090, so it's seems not related to networking or authorization (which is disabled for my instance). This suggests, that issue is only on HBase or ODBC connector used by ADF, since alternatives ran on my local machine are working without any issues.

Was any of you able to set working connection to self hosted HBase as sink using ADF?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,507 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.