How to add SSL certificate for using it on Databricks cluster with PySpark

Sypula, Aleksandra 0 Reputation points
2024-04-03T08:20:22.1+00:00

Hi,

We would like to use functions written in PySpark for calling an external service that requires SSL certificate on the cluster. Currently we are using an init script similar to explained in the documentation - https://kb.databricks.com/python/import-custom-ca-cert which in our case fetches a secret from our keyvault by using spark environment variable - $SERVICE_CERT and updates them on the cluster:
echo "Adding certificate"

echo "-----BEGIN CERTIFICATE-----" > /usr/local/share/ca-certificates/new_cert.crt

echo $SERVICE_CERT >> /usr/local/share/ca-certificates/new_cert.crt

echo "-----END CERTIFICATE-----" >> /usr/local/share/ca-certificates/new_cert.crt

update-ca-certificates

echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh

echo "Certificate added successfully"

We would like to move out from the init script and for this reason we will need to add the custom certificate in a different way. We tried running above script as a regular bash script on an already running cluster but it did not work (and nonetheless it will not solve our problem).

Also running the script in Python notebook as below did not work:
from`` subprocess import call

call``('echo "-----BEGIN CERTIFICATE-----" > /usr/local/share/ca-certificates new_skid_cert.crt', shell=True)

call``('echo $SKID_SERVICE_CERT >> /usr/local/share/ca-certificates/new_skid_cert.crt', shell=True)

call``('echo "-----END CERTIFICATE-----" >> /usr/local/share/ca-certificates/new_skid_cert.crt', shell=True)

call``('update-ca-certificates', shell=True)

call``('echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh', shell=True)

Spark documentation does not indicate any parameters related to additional certificates inclusion: https://spark.apache.org/docs/2.2.2/configuration.html#tls--ssl.

Could you please advise on the above topic? Is there a way to do it without an init script?

Many thanks in advance,

Ola

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,937 questions
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 77,676 Reputation points Microsoft Employee
    2024-04-04T08:09:32.3366667+00:00

    @Sypula, Aleksandra - Here is the update from the product team:

    We would like to move out from the init script. One way to get away from legacy init scripts, is to use compute policies. I believe you will still use your existing init script, its just the way you enable it is different with this approach and recommended by Databricks.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.