How to Report File Metadata for 2000+ Files Uploaded by Multiple Users to Azure Blob Storage

Prasad Sandu 0 Reputation points
2025-07-03T02:41:39.2233333+00:00

Hi Team,

I would like to know following questions. Please do needful.

1.How to Report File Metadata for 2000+ Files Uploaded by Multiple Users to Azure Blob Storage

2.How to connect Hadoop hive to databricks

Regards,

Prasad,

5017654887.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,652 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 34,101 Reputation points Volunteer Moderator
    2025-07-03T08:32:44.51+00:00

    Hello Prasad !

    Thank you for posting on Microsoft Learn.

    You can use ADF with ADLS or blob storage. For ADF, you can use the Get Metadata activity to retrieve metadata for each file and you can then use Foreach activity with pagination or batching.

    Then you build reports in Power BI or Azure Synapse Analytics.

    For the destination, you can use Azure SQL DB, CSV, or Parquet in Data Lake.

    If you are comfortable with Azure functions or python scripting in Azure SDK

    from azure.storage.blob import BlobServiceClient
    service = BlobServiceClient.from_connection_string("<your-connection-string>")
    container_client = service.get_container_client("<your-container-name>")
    for blob in container_client.list_blobs():
        print(blob.name, blob.size, blob.last_modified, blob.metadata)
    

    Then you export results to CSV or push to a SQL database and you can automate this via Azure Functions or ADF custom activity.

    When it comes to connecting Hive to Databricks and querying Hive tables:

    • Export Delta Lake data to a Hive-compatible format (Parquet or ORC).
    • Mount ADLS Gen2 or Blob Storage in Hive using ABFS or WASB protocol.
    • Create external Hive tables on top of those files.

    Or from Databricks to Hive :

    • Option 1: Use Unity Catalog (recommended for governance).
    • Option 2: Use an external Hive metastore (from HDInsight, Cloudera).
    spark.conf.set("hive.metastore.uris", "thrift://<hive-host>:9083")
    spark.sql("SHOW TABLES IN default").show()
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.