How to copy files to working directory in the azure databricks?

Sharath 20 Reputation points
2023-06-05T01:11:47.82+00:00

I'm using the databricks api to submit/create a job, please find the payload as below. and then manually running the job from the UI. When I run the job, basically it's a jar which talks to Hana data lake.

{
    "name": "sharath-sparkconf-{{$timestamp}}",
    "existing_cluster_id": "0602-xxxxx-yyyy",
    "libraries": [
        {
            "jar": "<path to jar>-order-core.jar"
        }
    ],
    "spark_jar_task": {
        "main_class_name": "streaming.xxx",
        "parameters": [
            
            "fs.azure.enable.flush=false",
            "fs.azure.account.key=,
            "fs.azure.account.name=",
            "spark.executorEnv.CLUSTER=v",
            "spark.executorEnv.NAME_SPACE=",
            "spark.executorEnv.AZURE_ACCOUNT_NAME=",
            "spark.executorEnv.KAFKA_HOST= ",
            "spark.executorEnv.KAFKA_PORT= ",
            "spark.executorEnv.KAFKA_USER= ",
            "spark.executorEnv.KAFKA_PASSWD=  ",
            "spark.executorEnv.HANA_DATA_LAKE_FILE_SYSTEM_URI=hdlfs://ondemand.com",
            "spark.yarn.submit.waitAppCompletion=false",
            "spark.hadoop.fs.azure.enable.flush=false",
            "spark.sql.legacy.parquet.int96RebaseModeInRead=LEGACY",
            "spark.sql.legacy.parquet.int96RebaseModeInWrite=LEGACY",
            "spark.sql.legacy.parquet.datetimeRebaseModeInRead=LEGACY",
            "spark.sql.legacy.parquet.datetimeRebaseModeInWrite=LEGACY",
            "spark.sql.legacy.timeParserPolicy=LEGACY",
            "spark.executorEnv.HANA_DATA_LAKE_PASSWORD= !",
            "spark.executorEnv.HANA_DATA_LAKE_PK12_LOCATION=/dbfs/Filestore/tables/.p12"
        ]
    }
    
}
HanaDatalake uses .p12 to establish the communication. My question is how can we upload the .pk12 certificate to working directory of the spark? I tried adding the certificate to dbf file and provided the file path, but ended up with the error show below.

Caused by: java.io.IOException: java.nio.file.NoSuchFileException: ./client-keystore.p12
    at com.sap.hana.datalake.files.HdlfsConnectionConfigurator.<init>(HdlfsConnectionConfigurator.java:71)
    ... 72 more
Caused by: java.nio.file.NoSuchFileException: ./client-keystore.p12
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
    at java.nio.file.Files.newByteChannel(Files.java:361)
    at java.nio.file.Files.newByteChannel(Files.java:407)
    at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
    at java.nio.file.Files.newInputStream(Files.java:152)
    at org.apache.hadoop.security.ssl.ReloadingX509KeystoreManager.loadKeyManager(ReloadingX509KeystoreManager.java:139)
    at org.apache.hadoop.security.ssl.ReloadingX509KeystoreManager.<init>(ReloadingX509KeystoreManager.java:76)

Earlier I was using Livy api where I can pass the dependency certification as the parameter using --file which used to make it available in the working directory for the spark.

how can we achieve same using databricks api?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,059 questions
{count} votes