Zeppelin notebook - sc.textFile does not work for HDI with ESP

Steven Lai 1 Reputation point
2021-02-11T15:14:14.283+00:00

We have HDI cluster with ESP enabled.

From our zeppelin notebook, when I read data to a dataset (spark.read.text) it works but when I try to read it to an RDD (sc.textFile), I get an authentication exception:

66978-screen.png

Note that, while sc.textFile failed in zeppelin, it works well from spark-shell. Moreover, "spark.read.text(path).rdd" (basically just read data into dataset and convert it to an RDD) also works in zeppelin

I found some related information from Internet such as https://community.cloudera.com/t5/Support-Questions/How-to-make-Zeppelin-s-User-Impersonation-work-with-Kerberos/td-p/212817 but if I got other errors if I choose 'User impersonate'.

Could you please advice that do I need to enable 'user impersonate' in order for 'sc.textFile' to work?

My livy2 interpreter config is as follows:

livy.spark.driver.cores
livy.spark.driver.memory
livy.spark.dynamicAllocation.cachedExecutorIdleTimeout
livy.spark.dynamicAllocation.enabled
livy.spark.dynamicAllocation.initialExecutors
livy.spark.dynamicAllocation.maxExecutors
livy.spark.dynamicAllocation.minExecutors
livy.spark.executor.cores
livy.spark.executor.instances
livy.spark.executor.memory
livy.spark.jars abfss://rtgasia-negotiation@e9vpzaab1y7fz2q1xprivate.dfs.core.windows.net/apps/rtgasia/pyspark_mapping_engine_v1.0/spark-avro_2.11-4.0.0.jar,abfss://rtgasia-negotiation@e9vpzaab1y7fz2q1xprivate.dfs.core.windows.net/apps/rtgasia/demo/circle-poc-assembly-1.0.jar
livy.spark.jars.packages
zeppelin.interpreter.localRepo /usr/hdp/current/zeppelin-server/local-repo/2C8A4SZ9T_livy2
zeppelin.interpreter.output.limit 102400
zeppelin.livy.concurrentSQL false
zeppelin.livy.displayAppInfo true
zeppelin.livy.keytab /etc/security/keytabs/zeppelin.server.kerberos.keytab
zeppelin.livy.principal zeppelin-ecialyxizp-projectspark@SGAZUREPRD.ONMICROSOFT.COM
zeppelin.livy.pull_status.interval.millis 1000
zeppelin.livy.session.create_timeout 120
zeppelin.livy.spark.sql.maxResult 1000
zeppelin.livy.url http://hn0-ecialy.sgazureprd.onmicrosoft.com:8998
zeppelin.spark.keytab /etc/security/keytabs/zeppelin.server.kerberos.keytab
zeppelin.spark.principal zeppelin-ecialyxizp-projectspark@SGAZUREPRD.ONMICROSOFT.COM

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
213 questions
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA 90,231 Reputation points
    2021-02-22T06:32:00.287+00:00

    Hello @Steven Lai ,

    As per repro from our end, we are able to see the results without any issues.

    Note: We are supposed to use the text file in the example and not the jar file.

    70511-image.png.

    I would suggest you to re-try with the text file and do let us know the status.

    Hope this helps. Do let us know if you any further queries.

    ------------

    Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.