How to copy files to working directory in the azure databricks?
Sharath
20
Reputation points
I'm using the databricks api to submit/create a job, please find the payload as below. and then manually running the job from the UI. When I run the job, basically it's a jar which talks to Hana data lake.
{
"name": "sharath-sparkconf-{{$timestamp}}",
"existing_cluster_id": "0602-xxxxx-yyyy",
"libraries": [
{
"jar": "<path to jar>-order-core.jar"
}
],
"spark_jar_task": {
"main_class_name": "streaming.xxx",
"parameters": [
"fs.azure.enable.flush=false",
"fs.azure.account.key=,
"fs.azure.account.name=",
"spark.executorEnv.CLUSTER=v",
"spark.executorEnv.NAME_SPACE=",
"spark.executorEnv.AZURE_ACCOUNT_NAME=",
"spark.executorEnv.KAFKA_HOST= ",
"spark.executorEnv.KAFKA_PORT= ",
"spark.executorEnv.KAFKA_USER= ",
"spark.executorEnv.KAFKA_PASSWD= ",
"spark.executorEnv.HANA_DATA_LAKE_FILE_SYSTEM_URI=hdlfs://ondemand.com",
"spark.yarn.submit.waitAppCompletion=false",
"spark.hadoop.fs.azure.enable.flush=false",
"spark.sql.legacy.parquet.int96RebaseModeInRead=LEGACY",
"spark.sql.legacy.parquet.int96RebaseModeInWrite=LEGACY",
"spark.sql.legacy.parquet.datetimeRebaseModeInRead=LEGACY",
"spark.sql.legacy.parquet.datetimeRebaseModeInWrite=LEGACY",
"spark.sql.legacy.timeParserPolicy=LEGACY",
"spark.executorEnv.HANA_DATA_LAKE_PASSWORD= !",
"spark.executorEnv.HANA_DATA_LAKE_PK12_LOCATION=/dbfs/Filestore/tables/.p12"
]
}
}
HanaDatalake uses .p12 to establish the communication. My question is how can we upload the .pk12 certificate to working directory of the spark? I tried adding the certificate to dbf file and provided the file path, but ended up with the error show below.
Caused by: java.io.IOException: java.nio.file.NoSuchFileException: ./client-keystore.p12
at com.sap.hana.datalake.files.HdlfsConnectionConfigurator.<init>(HdlfsConnectionConfigurator.java:71)
... 72 more
Caused by: java.nio.file.NoSuchFileException: ./client-keystore.p12
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.newByteChannel(Files.java:407)
at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
at java.nio.file.Files.newInputStream(Files.java:152)
at org.apache.hadoop.security.ssl.ReloadingX509KeystoreManager.loadKeyManager(ReloadingX509KeystoreManager.java:139)
at org.apache.hadoop.security.ssl.ReloadingX509KeystoreManager.<init>(ReloadingX509KeystoreManager.java:76)
Earlier I was using Livy api where I can pass the dependency certification as the parameter using --file which used to make it available in the working directory for the spark.
how can we achieve same using databricks api?
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,561 questions
Sign in to answer