Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to:
SQL Server 2019 (15.x)
Important
The Microsoft SQL Server 2019 Big Data Clusters add-on will be retired. Support for SQL Server 2019 Big Data Clusters will end on February 28, 2025. All existing users of SQL Server 2019 with Software Assurance will be fully supported on the platform and the software will continue to be maintained through SQL Server cumulative updates until that time. For more information, see the announcement blog post and Big data options on the Microsoft SQL Server platform.
One of the key scenarios for SQL Server Big Data Clusters is the ability to submit Spark jobs. The Spark job submission feature allows you to submit a local Jar or Py files with references to SQL Server Big Data Clusters. It also enables you to execute a Jar or Py files, which are already located in the HDFS file system.
Prerequisites
- SQL Server big data cluster.
- Java Development Kit.
- IntelliJ IDEA. You can install it from the JetBrains website.
- Azure Toolkit for IntelliJ extension. For installation instructions, see Install Azure Toolkit for IntelliJ.
Link SQL Server big data cluster
Open the IntelliJ IDEA tool.
If you are using self-signed certificate, disable TLS/SSL certificate validation from Tools menu, select Azure, Validate Spark Cluster SSL Certificate, then Disable.
Open Azure Explorer from View menu, select Tool Windows, and then select Azure Explorer.
Right-click on SQL Server big data cluster, select Link SQL Server big data cluster. Enter the Server, User Name, and Password, then click OK.
When the untrusted server's certificate dialog appears, click Accept. You can manage the certificate later, see Server Certificates.
The linked cluster lists under SQL Server big data cluster. You could monitor spark job by opening the spark history UI and Yarn UI, you could also unlink, by right-clicking on the cluster.
Create a Spark Scala application from Spark template
Start IntelliJ IDEA, and then create a project. In the New Project dialog box, follow below steps:
a. Select Azure Spark/HDInsight > Spark Project with Samples (Scala).
b. In the Build tool list, select either of the following, according to your need:
- Maven, for Scala project-creation wizard support
- SBT, for managing the dependencies and building for the Scala project
Select Next.
The Scala project-creation wizard automatically detects whether you've installed the Scala plug-in. Select Install.
To download the Scala plug-in, select OK. Follow the instructions to restart IntelliJ.
In the New Project window, do the following steps:
a. Enter a project name and location.
b. In the Project SDK drop-down list, select Java 1.8 for the Spark 2.x cluster, or select Java 1.7 for the Spark 1.x cluster.
c. In the Spark version drop-down list, Scala project creation wizard integrates the proper version for Spark SDK and Scala SDK. If the Spark cluster version is earlier than 2.0, select Spark 1.x. Otherwise, select Spark2.x. This example uses Spark 2.0.2 (Scala 2.11.8).
Select Finish.
The Spark project automatically creates an artifact for you. To view the artifact, do the following steps:
a. On the File menu, select Project Structure.
b. In the Project Structure dialog box, select Artifacts to view the default artifact that is created. You can also create your own artifact by selecting the plus sign (+).
Submit application to SQL Server big data cluster
After link a SQL Server big data cluster, you can submit application to it.
Set up the configuration in Run/Debug Configurations window, click +->Apache Spark on SQL Server, select tab Remotely Run in Cluster, set the parameters as following, then click OK.
For Spark clusters (Linux only), select the cluster on which you want to run your application.
Select an artifact from the IntelliJ project, or select one from the hard drive.
Main class name field: The default value is the main class from the selected file. You can change the class by selecting the ellipsis(...) and choosing another class.
Job Configurations field: The default values are set as picture shown above. You can change the value or add new key/value for your job submission. For more information: Apache Livy REST API
Command line arguments field: You can enter the arguments values split by space for the main class if needed.
Referenced Jars and Referenced Files fields: You can enter the paths for the referenced Jars and files if any. For more information: Apache Spark Configuration
Note
To upload your Referenced JARs and Referenced Files, refer to: How to upload resources to cluster
Upload Path: You can indicate the storage location for the Jar or Scala project resources submission. There are several storage types supported: Use Spark interactive session to upload and Use WebHDFS to upload
Click SparkJobRun to submit your project to the selected cluster. The Remote Spark Job in Cluster tab displays the job execution progress at the bottom. You can stop the application by clicking the red button.
Spark Console
You can run Spark Local Console(Scala) or run Spark Livy Interactive Session Console(Scala).
Spark Local Console(Scala)
Ensure you have satisfied the WINUTILS.EXE prerequisite.
From the menu bar, navigate to Run > Edit Configurations....
From the Run/Debug Configurations window, in the left pane, navigate to Apache Spark on SQL Server big data cluster > [Spark on SQL] myApp.
From the main window, select the Locally Run tab.
Provide the following values, and then select OK:
Property Value Job main class The default value is the main class from the selected file. You can change the class by selecting the ellipsis(...) and choosing another class. Environment variables Ensure the value for HADOOP_HOME is correct. WINUTILS.exe location Ensure the path is correct. From Project, navigate to myApp > src > main > scala > myApp.
From the menu bar, navigate to Tools > Spark Console > Run Spark Local Console(Scala).
Then two dialogs may be displayed to ask you if you want to auto fix dependencies. If so, select Auto Fix.
The console should look similar to the picture below. In the console window type
sc.appName
, and then press ctrl+Enter. The result will be shown. You can terminate the local console by clicking red button.
Spark Livy Interactive Session Console(Scala)
The Spark Livy Interactive Session Console(Scala) is only supported on IntelliJ 2018.2 and 2018.3.
From the menu bar, navigate to Run > Edit Configurations....
From the Run/Debug Configurations window, in the left pane, navigate to Apache Spark on SQL Server big data cluster > [Spark on SQL] myApp.
From the main window, select the Remotely Run in Cluster tab.
Provide the following values, and then select OK:
Property Value Spark clusters (Linux only) Select the SQL Server Big Data cluster on which you want to run your application. Main class name The default value is the main class from the selected file. You can change the class by selecting the ellipsis(...) and choosing another class. From Project, navigate to myApp > src > main > scala > myApp.
From the menu bar, navigate to Tools > Spark Console > Run Spark Livy Interactive Session Console(Scala).
The console should look similar to the picture below. In the console window type
sc.appName
, and then press ctrl+Enter. The result will be shown. You can terminate the local console by clicking red button.
Send Selection To Spark Console
For convenience, you can see the script result by sending some code to the Local Console or Livy Interactive Session Console(Scala). You can highlight some code in the Scala file, then right-click Send Selection To Spark Console. The selected code will be sent to the console and be performed. The result will be displayed after the code in the console. The console will check the errors if existing.
Next steps
For more information on SQL Server big data cluster and related scenarios, see Introducing SQL Server 2019 Big Data Clusters.