High concurrency mode in Apache Spark for Fabric
High concurrency mode allows users to share the same Spark sessions in Spark for Fabric for data engineering and data science workloads. An item like a notebook uses a standard Spark session for its execution. In high concurrency mode, the Spark session can support independent execution of multiple items within individual read-eval-print loop (REPL) cores that exist within the Spark application. These REPL cores provide isolation for each item, and prevent local notebook variables from being overwritten by variables with the same name from other notebooks sharing the same session.
As the session is already running, this provides users with an instant run experience when reusing the session across multiple notebooks.
Note
In the case of custom pools with high concurrency mode, users get 36X faster session start experience compared to a standard Spark session.
Important
Session sharing conditions include:
- Sessions should be within a single user boundary.
- Sessions should have the same default lakehouse configuration.
- Sessions should have the same Spark compute properties.
As part of Spark session initialization, a REPL core is created. Every time a new item starts sharing the same session and the executors are allocated in FAIR based manner to these notebooks running in these REPL cores inside the Spark application preventing starvation scenarios.
Related content
- To get started with high concurrency mode in notebooks, see Configure high concurrency mode for Fabric notebooks.
Phản hồi
https://aka.ms/ContentUserFeedback.
Sắp ra mắt: Trong năm 2024, chúng tôi sẽ dần gỡ bỏ Sự cố với GitHub dưới dạng cơ chế phản hồi cho nội dung và thay thế bằng hệ thống phản hồi mới. Để biết thêm thông tin, hãy xem:Gửi và xem ý kiến phản hồi dành cho