Share via


Set and use environment variables with init scripts

Init scripts have access to all environment variables present on a cluster.

Default environment variables

Azure Databricks sets many default variables that can be useful in init script logic. Cluster-scoped and global init scripts support the following environment variables:

  • DB_CLUSTER_ID: the ID of the cluster on which the script is running. See the Clusters API.
  • DB_CONTAINER_IP: the private IP address of the container in which Spark runs. The init script is run inside this container. See the Clusters API.
  • DB_IS_DRIVER: whether the script is running on a driver node.
  • DB_DRIVER_IP: the IP address of the driver node.
  • DB_INSTANCE_TYPE: the instance type of the host VM.
  • DB_CLUSTER_NAME: the name of the cluster the script is executing on.
  • DB_IS_JOB_CLUSTER: whether the cluster was created to run a job. See Configure compute for jobs.

You cannot override these predefined environment variables.

Set custom environment variables

Custom environment variables that you can access from init scripts running on the compute resource can be set in the Spark config. See Environment variables.

You can also set environment variables using the spark_env_vars field in the Create cluster API or Update cluster API.

Use environment variables

The following example uses a default environment variable to run part of a script only on a driver node:

echo $DB_IS_DRIVER
if [[ $DB_IS_DRIVER = "TRUE" ]]; then
  <run this part only on driver>
else
  <run this part only on workers>
fi
<run this part on both driver and workers>

Secrets in init scripts

You can use any valid variable name when you reference a secret. Access to secrets referenced in environment variables is determined by the permissions of the user who configured the cluster. Secrets stored in environment variables are accessible by all users of the cluster, but are redacted from plaintext display.

See Use a secret in a Spark configuration property or environment variable.