Install .NET for Apache Spark on Jupyter Notebooks on Azure HDInsight Spark clusters

This article teaches you how to install .NET for Apache Spark on Jupyter Notebooks on Azure HDInsight Spark clusters. You can deploy .NET for Apache Spark on Azure HDInsight clusters through a combination of the command line and the Azure portal (for more information, see how to deploy a .NET for Apache Spark application to Azure HDInsight), but notebooks provide a more interactive and iterative experience.

Azure HDInsight clusters already come with Jupyter Notebooks, so all you have to do is configure the Jupyter Notebooks to run .NET for Apache Spark. To use .NET for Apache Spark in your Jupyter Notebooks, a C# REPL is needed to execute your C# code line-by-line and to preserve execution state when necessary. Try .NET has been integrated as the official .NET REPL.

To enable .NET for Apache Spark through the Jupyter Notebooks experience, you need to follow a few manual steps through Ambari and submit script actions on the HDInsight Spark cluster.

Note

This feature is experimental and is not supported by the HDInsight Spark team.

Warning

.NET for Apache Spark targets an out-of-support version of .NET (.NET Core 3.1). For more information, see the .NET Support Policy.

Prerequisites

If you don't already have one, create an Azure HDInsight Spark cluster.

  1. Visit the Azure portal and select + Create a Resource.

  2. Create a new Azure HDInsight cluster resource. Select Spark 2.4 and HDI 4.0 during cluster creation.

Install .NET for Apache Spark

In the Azure portal, select the HDInsight Spark cluster you created in the previous step.

Stop the Livy server

  1. From the portal, select Overview, and then select Ambari home. If prompted, enter the login credentials for the cluster.

    Select Ambari home under Cluster dashboards

  2. Select Spark2 from the left navigation menu, and select LIVY FOR SPARK2 SERVER.

    Select Livy for Spark2 Server

  3. Select hn0... host.

    Hosts showing "hno..." selected

  4. Select the ellipsis next to Livy for Spark2 Server and select Stop. When prompted, select OK to proceed.

    Stop Livy for Spark2 Server. Select the ellipsis and then Stop

  5. Repeat the previous steps for hn1... host.

Submit an HDInsight script action

  1. The install-interactive-notebook.sh is a script that installs .NET for Apache Spark and makes changes to Apache Livy and sparkmagic. Before you submit a script action to HDInsight, you need to create and upload install-interactive-notebook.sh.

    Create a new file named install-interactive-notebook.sh in your local computer and paste the contents of install-interactive-notebook.sh contents.

    Upload the script to a URI that's accessible from the HDInsight cluster. For example, https://<my storage account>.blob.core.windows.net/<my container>/<some dir>/install-interactive-notebook.sh.

  2. Run install-interactive-notebook.sh on the cluster using HDInsight Script Actions.

    Return to your HDI cluster in the Azure portal, and select Script actions from the options on the left. You submit one script action to deploy the .NET for Apache Spark REPL on your HDInsight Spark cluster. Use the following settings:

    Property Description
    Script type Custom
    Name Install .NET for Apache Spark Interactive Notebook Experience
    Bash script URI The URI to which you uploaded install-interactive-notebook.sh.
    Node type(s) Head and Worker
    Parameters .NET for Apache Spark version. You can check .NET for Apache Spark releases. For example, if you want to install Sparkdotnet version 1.0.0 then it would be 1.0.0.

    Move to the next step when green checkmarks appear next to the status of the script action.

Start the Livy server

Follow the instructions in the Stop Livy server section to Start (rather than Stop) the Livy for Spark2 Server for hosts hn0 and hn1.

Set up Spark default configurations

  1. From the portal, select Overview, and then select Ambari home. If prompted, enter the cluster login credentials for the cluster.

  2. Select Spark2 and CONFIGS. Then, select Custom spark2-defaults.

    Configs tab in Ambari

  3. Select Add Property to add Spark default settings.

    Add Property

    There are three individual properties. Add them one at a time using the TEXT property type in Single property add mode. Check that you don't have any extra spaces before or after any of the keys/values.

    • Property 1

      • Key:  spark.dotnet.shell.command
      • Value: /usr/share/dotnet-tools/dotnet-try,kernel-server,--default-kernel,csharp
    • Property 2 Use the version of .NET for Apache Spark which you had included in the previous script action.

      • Key:  spark.dotnet.packages
      • Value: ["nuget: Microsoft.Spark, 1.0.0", "nuget: Microsoft.Spark.Extensions.Delta, 1.0.0"]
    • Property 3

      • Key:  spark.dotnet.interpreter
      • Value: try

    For example, the following image captures the setting for adding property 1:

    Add a text property

    After adding the three properties, select SAVE. If you see a warning screen of config recommendations, select PROCEED ANYWAY.

  4. Restart affected components.

    After adding the new properties, you need to restart components that were affected by the changes. At the top, select RESTART, and then Restart All Affected from the drop-down.

    Configs tab with Restart > Restart All Affected highlighted

    When prompted, select CONFIRM RESTART ALL to continue, then click OK to finish.

Submit jobs through a Jupyter Notebook

After finishing the previous steps, you can now submit your .NET for Apache Spark jobs through Jupyter Notebooks.

  1. Create a new .NET for Apache Spark notebook. Launch a Jupyter Notebook from your HDI cluster in the Azure portal.

    Launch Jupyter Notebook

    Then, select New > .NET Spark (C#) to create a notebook.

    Jupyter Notebook

  2. Submit jobs using .NET for Apache Spark.

    Use the following code snippet to create a DataFrame:

    var df = spark.Range(0,5);
    df.Show();
    

    Create a DataFrame showing command execution

    Use the following code snippet to register a user-defined function (UDF) and use the UDF with DataFrames:

    var myawesomeudf = Udf<int, string>((id) => $"hello {id}");
    df.Select(myawesomeudf(df["id"])).Show();
    

    Register a UDF and use it

Next steps