Start a Ray cluster on Azure Databricks

Άρθρο
07/02/2024

Azure Databricks simplifies the process of starting a Ray cluster by handling cluster and job configuration the same way it does with any Apache Spark job. This is because the Ray cluster is actually started on top of the managed Apache Spark cluster.

Run Ray on a local machine

import ray

ray.init()

Run Ray on Azure Databricks

from ray.util.spark import setup_ray_cluster
import ray

# If the cluster has four workers with 8 CPUs each as an example
setup_ray_cluster(num_worker_nodes=4, num_cpus_per_worker=8)

# Pass any custom configuration to ray.init
ray.init(ignore_reinit_error=True)

This approach works at any cluster scale, from a few to hundreds of nodes. Ray clusters on Azure Databricks also support autoscaling.

After creating the Ray cluster, you can run any Ray application code in a Azure Databricks notebook.

Important

Databricks recommends installing any necessary libraries for your application with %pip install <your-library-dependency> to ensure they are available to your Ray cluster and application accordingly. Specifying dependencies in the Ray init function call installs the dependencies in a location inaccessible to the Apache Spark worker nodes, which results in version incompatibilities and import errors.

For example, you can run a simple Ray application in a Azure Databricks notebook as follows:

import ray
import random
import time
from fractions import Fraction

ray.init()

@ray.remote
def pi4_sample(sample_count):
    """pi4_sample runs sample_count experiments, and returns the
    fraction of time it was inside the circle.
    """
    in_count = 0
    for i in range(sample_count):
        x = random.random()
        y = random.random()
        if x*x + y*y <= 1:
            in_count += 1
    return Fraction(in_count, sample_count)

SAMPLE_COUNT = 1000 * 1000
start = time.time()
future = pi4_sample.remote(sample_count=SAMPLE_COUNT)
pi4 = ray.get(future)
end = time.time()
dur = end - start
print(f'Running {SAMPLE_COUNT} tests took {dur} seconds')

pi = pi4 * 4
print(float(pi))

Shut down a Ray cluster

Ray clusters automatically shut down under the following circumstances:

You detach your interactive notebook from your Azure Databricks cluster.
Your Azure Databricks job is completed.
Your Azure Databricks cluster is restarted or terminated.
There’s no activity for the specified idle time.

To shut down a Ray cluster running on Azure Databricks, you can call the ray.utils.spark.shutdown_ray_cluster API.

from ray.utils.spark import shutdown_ray_cluster
import ray

shutdown_ray_cluster()
ray.shutdown()

Next steps

Scale Ray clusters on Azure Databricks

Κοινή χρήση μέσω