Hi,
We would like to use functions written in PySpark for calling an external service that requires SSL certificate on the cluster. Currently we are using an init script similar to explained in the documentation - https://kb.databricks.com/python/import-custom-ca-cert which in our case fetches a secret from our keyvault by using spark environment variable - $SERVICE_CERT and updates them on the cluster:
echo "Adding certificate"
echo "-----BEGIN CERTIFICATE-----" > /usr/local/share/ca-certificates/new_cert.crt
echo $SERVICE_CERT >> /usr/local/share/ca-certificates/new_cert.crt
echo "-----END CERTIFICATE-----" >> /usr/local/share/ca-certificates/new_cert.crt
update-ca-certificates
echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh
echo "Certificate added successfully"
We would like to move out from the init script and for this reason we will need to add the custom certificate in a different way. We tried running above script as a regular bash script on an already running cluster but it did not work (and nonetheless it will not solve our problem).
Also running the script in Python notebook as below did not work:
from`` subprocess import call
call``('echo "-----BEGIN CERTIFICATE-----" > /usr/local/share/ca-certificates new_skid_cert.crt', shell=True)
call``('echo $SKID_SERVICE_CERT >> /usr/local/share/ca-certificates/new_skid_cert.crt', shell=True)
call``('echo "-----END CERTIFICATE-----" >> /usr/local/share/ca-certificates/new_skid_cert.crt', shell=True)
call``('update-ca-certificates', shell=True)
call``('echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh', shell=True)
Spark documentation does not indicate any parameters related to additional certificates inclusion: https://spark.apache.org/docs/2.2.2/configuration.html#tls--ssl.
Could you please advise on the above topic? Is there a way to do it without an init script?
Many thanks in advance,
Ola