multiple json files load through api

Shambhu Rai 1,411 Reputation points
2023-07-07T08:16:47.0866667+00:00

Hi Expert,

How to load multiple json files using API URL in blob or databricks

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,559 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,514 questions
Azure Database for MySQL
Azure Database for MySQL
An Azure managed MySQL database service for app development and deployment.
986 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA 90,641 Reputation points Moderator
    2023-07-07T08:46:08.9033333+00:00

    @Shambhu Rai - Thanks for the question and using MS Q&A platform.

    To load multiple JSON files using an API URL in Azure Databricks, you can use the dbutils.fs.mount command to mount the Blob storage container that contains the JSON files, and then use the spark.read.json command to read the files into a DataFrame.

    Here's an example code snippet that demonstrates how to do this:

    # Mount the Blob storage container
    storage_account_name = "<your-storage-account-name>"
    container_name = "<your-container-name>"
    storage_account_access_key = "<your-storage-account-access-key>"
    mount_point = "/mnt/<your-mount-point>"
    dbutils.fs.mount(
      source=f"wasbs://{container_name}@{storage_account_name}.blob.core.windows.net",
      mount_point=mount_point,
      extra_configs={
        f"fs.azure.account.key.{storage_account_name}.blob.core.windows.net": storage_account_access_key
      }
    )
    
    # Read the JSON files into a DataFrame
    json_files_path = f"{mount_point}/path/to/json/files/*.json"
    df = spark.read.json(json_files_path)
    

    In this example, you'll need to replace the placeholders <your-storage-account-name>, <your-container-name>, <your-storage-account-access-key>, <your-mount-point>, and /path/to/json/files/ with your own values.

    Alternatively, if you don't want to mount the Blob storage container, you can use the spark.read.json command to read the JSON files directly from the API URL. Here's an example code snippet that demonstrates how to do this:

    # Read the JSON files into a DataFrame
    json_files_url = "<your-api-url>"
    df = spark.read.json(json_files_url)
    

    In this example, you'll need to replace the placeholder <your-api-url> with the URL of the API that returns the JSON files.

    As per the repro, I had tested in my environment.

    I have 3 json blob files inside the subfolder of my container in storage account. I am able to read all the blob json files in a single data frame.

    User's image

    You can use the below code to display all json the files from the subfolder in a single data frame

    df = spark.read.json("wasbs://container_name@blob_storage_account.blob.core.windows.net/sub_folder/*.json")
    df.show()
    

    User's image

    For more details, refer toAzure Databricks - JSON File.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.