Remotly connect to AzureHDInsight 5.0 spark cluster

Dario Bertolino 0 Reputation points
2023-08-29T12:42:49.91+00:00

Hi,

I have a local jupyter notebook and Im trying to connect to an AzureHDInsight 5.0 spark cluster.

I found tutorials on how to create a jupyter notebook directly inside the Azure Cloud Interface of the cluster, but what if I want to connect remotly?

Something like:

# Replace with your HDInsight cluster information
cluster_name = "NAME"
client_id = "CLIENT-ID"
client_secret = "CLIENT-SECRET"
tenant_id = "TENANT-ID"
container = "CONTAINER-NAME"

spark = SparkSession.builder \
    .appName("Write to HDFS") \
    .config("spark.hadoop.fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem") \
    .config("spark.hadoop.fs.azure.account.auth.type." + cluster_name + ".dfs.core.windows.net", "OAuth") \
    .config("spark.hadoop.fs.azure.account.oauth.provider.type." + cluster_name + ".dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider") \
    .config("spark.hadoop.fs.azure.account.oauth2.client.id." + cluster_name + ".dfs.core.windows.net", client_id) \
    .config("spark.hadoop.fs.azure.account.oauth2.client.secret." + cluster_name + ".dfs.core.windows.net", client_secret) \
    .config("spark.hadoop.fs.azure.account.oauth2.client.endpoint." + cluster_name + ".dfs.core.windows.net", "https://login.microsoftonline.com/" + tenant_id+ "/oauth2/token") \
    .config("spark.hadoop.fs.defaultFS", "wasb://"+ container + "@" + cluster_name + ".blob.core.windows.net/")\
    .getOrCreate()

Is it possible?

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
199 questions
{count} votes

1 answer

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,422 Reputation points Microsoft Employee
    2023-08-30T20:02:18.6466667+00:00

    Hi @Dario Bertolino Welcome to Microsoft Q&A forum and thanks for reaching out here.

    Here is the MS documentation which explains below steps to achieve your requirement - Install Jupyter Notebook on your computer and connect to Apache Spark on HDInsight

    There are four key steps involved in installing Jupyter Notebook locally and connecting to Apache Spark on HDInsight.

    • Configure Spark cluster.
    • Install Jupyter Notebook.
    • Install the PySpark and Spark kernels with the Spark magic.
    • Configure Spark magic to access Spark cluster on HDInsight.

    Another additional helpful resource but it a bit outdated - Remotely execute a Spark job on an HDInsight cluster

    Hope this helps. Let us know how it goes.

    Thank you


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments