question

BirajdarSujata-6762 avatar image
0 Votes"
BirajdarSujata-6762 asked PRADEEPCHEEKATLA-MSFT answered

50 million records in Databricks using json files

Hi All,


I need to infer schema into json file .Read 10 json files close to 50 million records.

Will Databricks support 50 million records using pyspark.

What all things we need to consider for good performance.




Thanks & Regards,
Sujata

azure-databricks
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

PRADEEPCHEEKATLA-MSFT avatar image
0 Votes"
PRADEEPCHEEKATLA-MSFT answered

Hello @BirajdarSujata-6762,

Welcome to the Microsoft Q&A platform.

Yes, Azure Databricks support 50 million records using pyspark.

For more details, refer to the below articles:

You may checkout the below articles which describes more on optimize good performance:

Hope this will help. Please let us know if any further queries.


  • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how

  • Want a reminder to come back and check responses? Here is how to subscribe to a notification

  • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.