@Shambhu Rai - Thanks for the question and using MS Q&A platform.
To load multiple JSON files using an API URL in Azure Databricks, you can use the dbutils.fs.mount
command to mount the Blob storage container that contains the JSON files, and then use the spark.read.json
command to read the files into a DataFrame.
- First, you need to mount the data lake storage account to Databricks. You can do this by following the instructions in the Databricks documentation: Connect to Azure Data Lake Storage Gen2 and Blob Storage or
Here's an example code snippet that demonstrates how to do this:
# Mount the Blob storage container
storage_account_name = "<your-storage-account-name>"
container_name = "<your-container-name>"
storage_account_access_key = "<your-storage-account-access-key>"
mount_point = "/mnt/<your-mount-point>"
dbutils.fs.mount(
source=f"wasbs://{container_name}@{storage_account_name}.blob.core.windows.net",
mount_point=mount_point,
extra_configs={
f"fs.azure.account.key.{storage_account_name}.blob.core.windows.net": storage_account_access_key
}
)
# Read the JSON files into a DataFrame
json_files_path = f"{mount_point}/path/to/json/files/*.json"
df = spark.read.json(json_files_path)
In this example, you'll need to replace the placeholders <your-storage-account-name>
, <your-container-name>
, <your-storage-account-access-key>
, <your-mount-point>
, and /path/to/json/files/
with your own values.
Alternatively, if you don't want to mount the Blob storage container, you can use the spark.read.json
command to read the JSON files directly from the API URL. Here's an example code snippet that demonstrates how to do this:
# Read the JSON files into a DataFrame
json_files_url = "<your-api-url>"
df = spark.read.json(json_files_url)
In this example, you'll need to replace the placeholder <your-api-url>
with the URL of the API that returns the JSON files.
As per the repro, I had tested in my environment.
I have 3 json blob files inside the subfolder of my container in storage account. I am able to read all the blob json files in a single data frame.
You can use the below code to display all json the files from the subfolder in a single data frame
df = spark.read.json("wasbs://container_name@blob_storage_account.blob.core.windows.net/sub_folder/*.json")
df.show()
For more details, refer toAzure Databricks - JSON File.
Hope this helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.