Does pandas and pyspark.pandas autoscale in spark?

Johnson, Matthew [DISYS] 20 Reputation points

I'd like to know if the native pandas package auto scales like the pyspark.pandas package in Azure Synapse?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,675 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,073 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,150 questions
0 comments No comments
{count} votes

Accepted answer
  1. Bhargava-MSFT 28,951 Reputation points Microsoft Employee

    Hello Johnson, Matthew [DISYS],

    Welcome to the MS Q&A platform.

    No, the native pandas package does not auto-scale in Azure Synapse. The native pandas package is a Python library designed to work on a single machine, and it does not have built-in support for distributed computing.

    Using PySpark pandas, you can take advantage of the distributed computing capabilities of Apache Spark to process large datasets in parallel across multiple nodes, which allows it to scale horizontally across multiple machines. This enables PySpark Pandas to handle large datasets that would be too big to fit into memory on a single machine.
    If you need to work with large datasets in Azure Synapse, it is recommended that you use PySpark Pandas or other distributed data processing frameworks like Databricks or HDInsight.


    I hope this helps. Please let us know if you have any further questions.

    2 people found this answer helpful.

0 additional answers

Sort by: Most helpful