How to Process JSON with Schema in Streaming

Sarvesh Pandey 71 Reputation points
2023-04-23T14:58:05.71+00:00

Hi. I have lots of JSON file am i am trying to learn Streaming in Pyspark. While using readStream we need schema of the data to be mentioned in. The JSON is quite complex and provide schema directly is difficult. Is there any method to provide schema of unflattens JSON file? Below image is the Schema of the JSON file BIPL JSON

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,960 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Sina Salam 4,056 Reputation points
    2023-04-24T02:28:57.5633333+00:00

    @sarvesh pandey Welcome to Microsoft Q&A and thank you for posting your question here! If you are looking for method or a command to flatten and unflatten jsons in bash, npx flat works, and likewise jq. You can use Simple command to flat & unflatten JSON files, checkout an example on Stack overflow You can also check this link on JSON format in Azure Data Factory and Azure Synapse Analytics from Microsoft Documentation. I hope this helps! Let me know if you have any other questions. Regards, Sina