Can't connect to HBase instance from Azure Data Factory to use it as Source
I'm trying to connect to Apache HBase instance (self hosted, not HDInsight from Azure) and I'm out of ideas how to do that. I've already tested few different options.
I've created Azure VM and installed HBase instance there. I've opened required ports to enable communication from outside of this VM. All necessary requests are working calling them from my local PC.
Approaches that I've taken are:
- Linked service based on built in HBase connector pointed to Thrift service.
I've set up a Thrift server on HBase instance hosted on Azure VM. I'm able to see REST control panel for this service from outside. Here's config window:
This returns error:
ERROR [HY000] [Microsoft][DriverSupport] (1110) Unexpected response received from server. Please ensure the server host and port specified for the connection are correct and confirm if SSL should be enabled for the connection.
- Linked service based on built in HBase connector pointed to REST API service.
This configuration works from LinkedService point of view. After creating DataSet, all the tables are listed too, but no data can be retrieved.
Error from Preview Data options is:
ERROR [HY000] [Microsoft][HBase] (40) Error with HTTP request, response code: 404
- Linked service based on ODBC driver and self hosted integration runtime.
As last resort option I've installed self hosted integration on my machine and CData ODBC Driver for Apache HBase. I'm able to retrieve all the tables and data by driver itself:
Both Self Hosted IR and ODBC linked service based on that IR works as expected, showing successful connection status
In that case, Data Set cannot list any of the tables that exists in HBase cluster.
Error received after selecting Preview data:
ERROR [42000] Invalid URI: The hostname could not be parsed.
I've tried multiple ways of providing Host address in connection string, using http:// with IP address and IP address alone, for each case result is the same. I've tried to use known value "test" as table name using Edit option, but this one didn't work too.
I have simple C# application with Thrift client inside and it works without any issues on port 9090, so it's seems not related to networking or authorization (which is disabled for my instance). This suggests, that issue is only on HBase or ODBC connector used by ADF, since alternatives ran on my local machine are working without any issues.
Was any of you able to set working connection to self hosted HBase as sink using ADF?