Does pandas and pyspark.pandas autoscale in spark?

Question

Does pandas and pyspark.pandas autoscale in spark?

Johnson, Matthew [DISYS] 20

I'd like to know if the native pandas package auto scales like the pyspark.pandas package in Azure Synapse?

Accepted answer

0 additional answers

Your answer

Answer 1

Bhargava-MSFT 31,261 Microsoft Employee Moderator

Hello Johnson, Matthew [DISYS],

Welcome to the MS Q&A platform.

No, the native pandas package does not auto-scale in Azure Synapse. The native pandas package is a Python library designed to work on a single machine, and it does not have built-in support for distributed computing.

Using PySpark pandas, you can take advantage of the distributed computing capabilities of Apache Spark to process large datasets in parallel across multiple nodes, which allows it to scale horizontally across multiple machines. This enables PySpark Pandas to handle large datasets that would be too big to fit into memory on a single machine.
If you need to work with large datasets in Azure Synapse, it is recommended that you use PySpark Pandas or other distributed data processing frameworks like Databricks or HDInsight.

References:

https://sparkbyexamples.com/pyspark/pandas-vs-pyspark-dataframe-with-examples/#:~:text=What%20is%20PySpark%3F,(100x)%20faster%20than%20Pandas.

https://medium.com/geekculture/pandas-vs-pyspark-fe110c266e5c

I hope this helps. Please let us know if you have any further questions.

Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-06-03T00:08:18.76+00:00

Hello Johnson, Matthew [DISYS],

I am checking to see if you got a chance to look into my above response. Please let us know if you have any further questions.

If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions.
Johnson, Matthew [DISYS] 20 Reputation points

2023-06-05T11:48:18.07+00:00

Yes, this did give me the perspective I needed to work on my solution.
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-06-05T16:05:10.11+00:00

Thank you, Johnson, Matthew [DISYS]

Share via

Does pandas and pyspark.pandas autoscale in spark?

0 additional answers

Your answer