High concurrency mode in Apache Spark for Fabric

Bài viết
06/28/2024

High concurrency mode allows users to share the same Spark sessions in Spark for Fabric for data engineering and data science workloads. An item like a notebook uses a standard Spark session for its execution. In high concurrency mode, the Spark session can support independent execution of multiple items within individual read-eval-print loop (REPL) cores that exist within the Spark application. These REPL cores provide isolation for each item, and prevent local notebook variables from being overwritten by variables with the same name from other notebooks sharing the same session.

As the session is already running, this provides users with an instant run experience when reusing the session across multiple notebooks.

Note

In the case of custom pools with high concurrency mode, users get 36X faster session start experience compared to a standard Spark session.

Important

Session sharing conditions include:

Sessions should be within a single user boundary.
Sessions should have the same default lakehouse configuration.
Sessions should have the same Spark compute properties.

As part of Spark session initialization, a REPL core is created. Every time a new item starts sharing the same session and the executors are allocated in FAIR based manner to these notebooks running in these REPL cores inside the Spark application preventing starvation scenarios.

To get started with high concurrency mode in notebooks, see Configure high concurrency mode for Fabric notebooks.

Chia sẻ qua

High concurrency mode in Apache Spark for Fabric

Phản hồi

Phản hồi

Tài nguyên bổ sung

Chia sẻ qua

High concurrency mode in Apache Spark for Fabric

Related content

Phản hồi

Phản hồi

Tài nguyên bổ sung