Azure Databrick Cluster Nodes Encryption in transit

Akash Verma 26 Reputation points
2020-10-02T11:35:51.33+00:00

I want to Encrypt traffic between cluster worker nodes although this feature is in private preview. Below is the link:
https://learn.microsoft.com/en-us/azure/databricks/security/encryption/encrypt-otw
Can someone share any commands/API to create databricks cluster using init script to implement encryption. I dont wat to do it via Console/GUI

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,161 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 88,716 Reputation points Microsoft Employee
    2020-10-05T06:09:40.087+00:00

    Hello @Akash Verma ,

    You can configure cluster-scoped init scripts using the UI, the CLI, and by invoking the Clusters API.

    You can use REST API to configure the cluster to run the init script.

    First create an init script and upload to the DBFS folder.

    On the existing cluster: You use the Clusters API to configure the cluster with ID 1202-211320-brick1 to run the init script, run the following command:

    curl -n -X POST -H 'Content-Type: application/json' -d '{  
      "cluster_id": "1202-211320-brick1",  
      "num_workers": 1,  
      "spark_version": "6.4.x-scala2.11",  
      "node_type_id": "Standard_D3_v2",  
      "cluster_log_conf": {  
        "dbfs" : {  
          "destination": "dbfs:/cluster-logs"  
        }  
      },  
      "init_scripts": [ {  
        "dbfs": {  
          "destination": "dbfs:/databricks/scripts/encrypt-traffic.sh"  
        }  
      } ]  
    }' https://<databricks-instance>/api/2.0/clusters/edit  
    

    Reference: Cluster-scoped init scripts

    First create an init script and upload to the DBFS folder.

    For new cluster, you can use the following command:

    curl -n -X POST -H 'Content-Type: application/json' -d '{  
        "num_workers": null,  
        "autoscale": {  
            "min_workers": 2,  
            "max_workers": 8  
        },  
        "cluster_name": "my-cluster",  
        "spark_version": "6.2.x-scala2.11",  
        "spark_conf": {},  
        "node_type_id": "Standard_D3_v2",  
        "custom_tags": {},  
        "spark_env_vars": {  
            "PYSPARK_PYTHON": "/databricks/python3/bin/python3"  
        },  
        "autotermination_minutes": 120,  
        "init_scripts": [{  
         "dbfs": {        
    "destination": "dbfs:/databricks/scripts/encrypt-traffic.sh"  
       }  
       }],  
    }  
    

    Reference: Azure Databricks REST API - create

    Hope this helps. Do let us know if you any further queries.

    ----------------------------------------------------------------------------------------

    Do click on "Accept Answer" and Upvote on the post that helps you, this can be beneficial to other community members.


  2. PRADEEPCHEEKATLA-MSFT 88,716 Reputation points Microsoft Employee
    2020-10-12T16:11:29.723+00:00

    Hello @Akash Verma ,

    Can you please confirm what i need to write under init script to encrypt traffic? What will be the exact content of init script.

    Here is the init script:

    #!/bin/bash  
      
    keystore_file="$DB_HOME/keys/jetty_ssl_driver_keystore.jks"  
    keystore_password="gb1gQqZ9ZIHS"  
      
    # Use the SHA256 of the JKS keystore file as a SASL authentication secret string  
    sasl_secret=$(sha256sum $keystore_file | cut -d' ' -f1)  
      
    spark_defaults_conf="$DB_HOME/spark/conf/spark-defaults.conf"  
    driver_conf="$DB_HOME/driver/conf/config.conf"  
      
    if [ ! -e $spark_defaults_conf ] ; then  
        touch $spark_defaults_conf  
    fi  
    if [ ! -e $driver_conf ] ; then  
        touch $driver_conf  
    fi  
      
    # Authenticate  
    echo "spark.authenticate true" >> $spark_defaults_conf  
    echo "spark.authenticate.secret $sasl_secret" >> $spark_defaults_conf  
      
    # Configure AES encryption  
    echo "spark.network.crypto.enabled true" >> $spark_defaults_conf  
    echo "spark.network.crypto.saslFallback false" >> $spark_defaults_conf  
      
    # Configure SSL  
    echo "spark.ssl.enabled true" >> $spark_defaults_conf  
    echo "spark.ssl.keyPassword $keystore_password" >> $spark_defaults_conf  
    echo "spark.ssl.keyStore $keystore_file" >> $spark_defaults_conf  
    echo "spark.ssl.keyStorePassword $keystore_password" >> $spark_defaults_conf  
    echo "spark.ssl.protocol TLSv1.2" >> $spark_defaults_conf  
    echo "spark.ssl.standalone.enabled true" >> $spark_defaults_conf  
    echo "spark.ssl.ui.enabled true" >> $spark_defaults_conf  
      
    cat ${DB_HOME}/driver/conf/spark-branch.conf > $driver_conf  
      
    # Authenticate  
    echo '"spark.authenticate"' = true >> $driver_conf  
    echo '"spark.authenticate.secret"' = \"$sasl_secret\" >> $driver_conf  
      
    # Configure AES encryption  
    echo '"spark.network.crypto.enabled"' = true >> $driver_conf  
    echo '"spark.network.crypto.saslFallback"' = false >> $driver_conf  
      
    # Configure SSL  
    echo '"spark.ssl.enabled"' = true >> $driver_conf  
    echo '"spark.ssl.keyPassword"' = \"$keystore_password\" >> $driver_conf  
    echo '"spark.ssl.keyStore"' = \"$keystore_file\" >> $driver_conf  
    echo '"spark.ssl.keyStorePassword"' = \"$keystore_password\" >> $driver_conf  
    echo '"spark.ssl.protocol"' = '"TLSv1.2"' >> $driver_conf  
    echo '"spark.ssl.standalone.enabled"' = true >> $driver_conf  
    echo '"spark.ssl.ui.enabled"' = true >> $driver_conf  
      
    mv $driver_conf ${DB_HOME}/driver/conf/spark-branch.conf  
    

    Copy and save it as encrypt-traffic.sh and upload to dbfs:/databricks/scripts

    And as per the link , jks file password needs to be hard coded. Is there any way to store that password in vault or other way because i dont want to create hardcode jks file password

    Note: The JKS keystore file used for enabling SSL/HTTPS is dynamically generated for each workspace. The password of the JKS keystore file is hardcoded and not intended to protect the confidentiality of the keystore. Do not assume that the keystore file itself is protected.

    And to add, can you please confirm if this can be done via Cluster CLI, if yes, can you please share the CLI command for new cluster and existing cluster.

    You can configure cluster-scoped init scripts using the UI, the CLI, and by invoking the Clusters API. This section focuses on performing these tasks using the UI. For the other methods, see Databricks CLI and Clusters API.

    Reference: Cluster-scoped init scripts

    Above answer mentions on how to configure cluster-scoped init scripts using REST APIs.

    You can use the same json file to create json file which contains the advance parameters.

    databricks clusters create --json-file /path/to/my/cluster_config.json  
    

    You run Databricks clusters CLI subcommands by appending them to databricks clusters.

        databricks clusters create --json '{  
         "num_workers": null,  
         "autoscale": {  
             "min_workers": 2,  
             "max_workers": 8  
         },  
         "cluster_name": "my-cluster",  
         "spark_version": "6.2.x-scala2.11",  
         "spark_conf": {},  
         "node_type_id": "Standard_D3_v2",  
         "custom_tags": {},  
         "spark_env_vars": {  
             "PYSPARK_PYTHON": "/databricks/python3/bin/python3"  
         },  
         "autotermination_minutes": 120,  
         "init_scripts": [{  
          "dbfs": {        
     "destination": "dbfs:/databricks/scripts/encrypt-traffic.sh"  
        }  
        }],  
     }'  
    

    Hope this helps. Do let us know if you any further queries.

    ----------------------------------------------------------------------------------------

    Do click on "Accept Answer" and Upvote on the post that helps you, this can be beneficial to other community members.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.