How can we parameterise Spark Jobs?

Question

The Spark Jobs UI in Azure Synapse has the option to pass command line arguments to the underlying code, but there doesn't appear to be any option to parameterise these arguments.

Similarly the Spark Job Definition activity in Azure Pipelines doesn't offer any parameterisation options.

Is there any way to pass parameters to a Azure Synapse job?

Accepted Answer

Hello @Steve Homer ,

Currently, the product team working on the public document or tutorial on How can we parameterise Spark jobs.

For now, you can use the job definition JSON file to parameterize the Spark job. Attached one sample file:

{  
  "targetBigDataPool": {  
    "referenceName": "yifso-1019",  
    "type": "SparkComputeReference"  
  },  
  "requiredSparkVersion": "2.4",  
  "jobProperties": {  
    "name": "job definition sample",  
    "file": "wasbs://ContainerName@StorageName.blob.core.windows.net/SparkSubmission/artifact/default_artifact.jar",  
    "className": "sample.LogQuery",  
    "args": [],  
    "jars": [],  
    "pyFiles": [],  
    "archives": [],  
    "files": [],  
    "conf": {  
      "spark.hadoop.fs.azure.account.key.StorageName.blob.core.windows.net": "StorageAccessKey"  
    },  
    "numExecutors": 2,  
    "executorCores": 4,  
    "executorMemory": "14g",  
    "driverCores": 4,  
    "driverMemory": "14g"  
  }  
}

The job definition JSON can be modified, imported, and run directly.

Hope this helps. Do let us know if you any further queries.

------------

Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.
Want a reminder to come back and check responses? Here is how to subscribe to a notification.

Share via

How can we parameterise Spark Jobs?

0 additional answers