Redaguoti

Bendrinti naudojant


High concurrency mode in Apache Spark for Fabric

High concurrency mode allows users to share the same Spark sessions in Spark for Fabric for data engineering and data science workloads. An item like a notebook uses a standard Spark session for its execution. In high concurrency mode, the Spark session can support independent execution of multiple items within individual read-eval-print loop (REPL) cores that exist within the Spark application. These REPL cores provide isolation for each item, and prevent local notebook variables from being overwritten by variables with the same name from other notebooks sharing the same session.

As the session is already running, this provides users with an instant run experience when reusing the session across multiple notebooks.

Note

In the case of custom pools with high concurrency mode, users get 36X faster session start experience compared to a standard Spark session.

Diagram showing the working of high concurrency mode in Fabric.

Important

Session sharing conditions include:

  • Sessions should be within a single user boundary.
  • Sessions should have the same default lakehouse configuration.
  • Sessions should have the same Spark compute properties.

As part of Spark session initialization, a REPL core is created. Every time a new item starts sharing the same session and the executors are allocated in FAIR based manner to these notebooks running in these REPL cores inside the Spark application preventing starvation scenarios.