Use a JAR in an Azure Databricks job

The Java archive or [JAR](https://en.wikipedia.org/wiki/JAR_(file_format) file format is based on the popular ZIP file format and is used for aggregating many Java or Scala files into one. Using the JAR task, you can ensure fast and reliable installation of Java or Scala code in your Azure Databricks jobs. This article provides an example of creating a JAR and a job that runs the application packaged in the JAR. In this example, you will:

  • Create the JAR project defining an example application.
  • Bundle the example files into a JAR.
  • Create a job to run the JAR.
  • Run the job and view the results.

Before you begin

You need the following to complete this example:

  • For Java JARs, the Java Development Kit (JDK).
  • For Scala JARs, the JDK and sbt.

Step 1: Create a local directory for the example

Create a local directory to hold the example code and generated artifacts, for example, databricks_jar_test.

Step 2: Create the JAR

Complete the following instructions to use Java or Scala to create the JAR.

Create a Java JAR

  1. From the databricks_jar_test folder, create a file named PrintArgs.java with the following contents:

    import java.util.Arrays;
    
    public class PrintArgs {
      public static void main(String[] args) {
        System.out.println(Arrays.toString(args));
      }
    }
    
  2. Compile the PrintArgs.java file, which creates the file PrintArgs.class:

    javac PrintArgs.java
    
  3. (Optional) Run the compiled program:

    java PrintArgs Hello World!
    
    # [Hello, World!]
    
  4. In the same folder as the PrintArgs.java and PrintArgs.class files, create a folder named META-INF.

  5. In the META-INF folder, create a file named MANIFEST.MF with the following contents. Be sure to add a newline at the end of this file:

    Main-Class: PrintArgs
    
  6. From the root of the databricks_jar_test folder, create a JAR named PrintArgs.jar:

    jar cvfm PrintArgs.jar META-INF/MANIFEST.MF *.class
    
  7. (Optional) From the root of the databricks_jar_test folder, run the JAR:

    java -jar PrintArgs.jar Hello World!
    
    # [Hello, World!]
    

    Note

    If you get the error no main manifest attribute, in PrintArgs.jar, be sure to add a newline to the end of the MANIFEST.MF file, and then try creating and running the JAR again.

Create a Scala JAR

  1. From the databricks_jar_test folder, create an empty file named build.sbt with the following contents:

    ThisBuild / scalaVersion := "2.12.14"
    ThisBuild / organization := "com.example"
    
    lazy val PrintArgs = (project in file("."))
      .settings(
        name := "PrintArgs"
      )
    
  2. From the databricks_jar_test folder, create the folder structure src/main/scala/example.

  3. In the example folder, create a file named PrintArgs.scala with the following contents:

    package example
    
    object PrintArgs {
      def main(args: Array[String]): Unit = {
        println(args.mkString(", "))
      }
    }
    
  4. Compile the program:

    sbt compile
    
  5. (Optional) Run the compiled program:

    sbt "run Hello World\!"
    
    # Hello, World!
    
  6. In the databricks_jar_test/project folder, create a file named assembly.sbt with the following contents:

    addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.0.0")
    
  7. From the root of the databricks_jar_test/project folder, create a JAR named PrintArgs-assembly-0.1.0-SNAPSHOT.jar in the target/scala-2.12 folder:

    sbt assembly
    
  8. (Optional) From the root of the databricks_jar_test folder, run the JAR:

    java -jar target/scala-2.12/PrintArgs-assembly-0.1.0-SNAPSHOT.jar Hello World!
    
    # Hello, World!
    

Step 3. Create an Azure Databricks job to run the JAR

  1. Go to your Azure Databricks landing page and do one of the following:
    • In the sidebar, click Jobs Icon Workflows and click Create Job Button.
    • In the sidebar, click New Icon New and select Job from the menu.
  2. In the task dialog box that appears on the Tasks tab, replace Add a name for your job… with your job name, for example JAR example.
  3. For Task name, enter a name for the task, for example java_jar_task for Java, or scala_jar_task for Scala.
  4. For Type, select JAR.
  5. For Main class, for this example, enter PrintArgs for Java, or example.PrintArgs for Scala.
  6. For Dependent libraries, click + Add.
  7. In the Add dependent library dialog, with Upload and JAR selected, drag your JAR (for this example, PrintArgs.jar for Java, or PrintArgs-assembly-0.1.0-SNAPSHOT.jar for Scala) into the dialog’s Drop JAR here area.
  8. Click Add.
  9. For Parameters, for this example, enter ["Hello", "World!"].
  10. Click Add.

Step 4: Run the job and view the job run details

Click Run Now Button to run the workflow. To view details for the run, click View run in the Triggered run pop-up or click the link in the Start time column for the run in the job runs view.

When the run completes, the output displays in the Output panel, including the arguments passed to the task.

Next steps

To learn more about creating and running Azure Databricks jobs, see Create, run, and manage Azure Databricks Jobs.