Hello Prasad !
Thank you for posting on Microsoft Learn.
You can use ADF with ADLS or blob storage. For ADF, you can use the Get Metadata activity to retrieve metadata for each file and you can then use Foreach activity with pagination or batching.
Then you build reports in Power BI or Azure Synapse Analytics.
For the destination, you can use Azure SQL DB, CSV, or Parquet in Data Lake.
If you are comfortable with Azure functions or python scripting in Azure SDK
from azure.storage.blob import BlobServiceClient
service = BlobServiceClient.from_connection_string("<your-connection-string>")
container_client = service.get_container_client("<your-container-name>")
for blob in container_client.list_blobs():
print(blob.name, blob.size, blob.last_modified, blob.metadata)
Then you export results to CSV or push to a SQL database and you can automate this via Azure Functions or ADF custom activity.
When it comes to connecting Hive to Databricks and querying Hive tables:
- Export Delta Lake data to a Hive-compatible format (Parquet or ORC).
- Mount ADLS Gen2 or Blob Storage in Hive using ABFS or WASB protocol.
- Create external Hive tables on top of those files.
Or from Databricks to Hive :
- Option 1: Use Unity Catalog (recommended for governance).
- Option 2: Use an external Hive metastore (from HDInsight, Cloudera).
spark.conf.set("hive.metastore.uris", "thrift://<hive-host>:9083")
spark.sql("SHOW TABLES IN default").show()