Import External Jars when using Jupyter Notebook with Spark Kernel

Question

_{Wednesday, November 23, 2016 9:27 PM}

Dear All,

I would like to use a Spark Kernel on Jupyter Notebook for HDInsight Spark Cluster.

I am trying to use the following magic

%%configure -f
{ 'spark.jars.packages': 'org.apache.bahir:spark-streaming-twitter_2.11:2.0.1' }

or this other:

%%configure -f
{ 'jars': 'wasb:///example/jars/spark-streaming-twitter_2.11-2.0.1.jar' }

I continue to receive this error.

"The code failed because of a fatal error:
    Session 1 unexpectedly reached final status 'dead'. See logs:
Some things to try:
a) Make sure Spark has enough available resources for Jupyter to create a Spark context. For instructions on how to assign resources see http://go.microsoft.com/fwlink/?LinkId=717038
b) Contact your cluster administrator to make sure the Spark magics library is configured correctly."Log shows:16/11/23 21:23:11 INFO SparkContext: Successfully stopped SparkContext
16/11/23 21:23:11 ERROR ApplicationMaster: User class threw exception: scala.reflect.internal.FatalError: object Predef does not have a member classOf
scala.reflect.internal.FatalError: object Predef does not have a member classOf
    at scala.reflect.internal.Definitions$DefinitionsClass.scala$reflect$internal$Definitions$DefinitionsClass$$fatalMissingSymbol(Definitions.scala:1186)
    at scala.reflect.internal.Definitions$DefinitionsClass.getMember(Definitions.scala:1203)
    at scala.reflect.internal.Definitions$DefinitionsClass.getMemberMethod(Definitions.scala:1238)
    at scala.reflect.internal.Definitions$DefinitionsClass$RunDefinitions.Predef_classOf$lzycompute(Definitions.scala:1469)
    at scala.reflect.internal.Definitions$DefinitionsClass$RunDefinitions.Predef_classOf(Definitions.scala:1469)
    at scala.reflect.internal.Definitions$DefinitionsClass$RunDefinitions.isPredefClassOf(Definitions.scala:1459)
    at scala.tools.nsc.typechecker.Typers$Typer.typedIdent$2(Typers.scala:4885)
    at scala.tools.nsc.typechecker.Typers$Typer.typedIdentOrWildcard$1(Typers.scala:4908)
    at scala.tools.nsc.typechecker.Typers$Typer.typedInAnyMode$1(Typers.scala:5340)
    at scala.tools.nsc.typechecker.Typers$Typer.typed1(Typers.scala:5360)

What's the right way to import an external jar?

Thx

Roberto

All replies (6)

_{Thursday, November 24, 2016 12:33 PM | 1 vote}

Hi Roberto,

Greetings from Microsoft Azure!

You will use the %%configure magic to configure the notebook to use an external package. In notebooks that use external packages, make sure you call the %%configuremagic in the first code cell. This ensures that the kernel is configured to use the package before the session starts.

For HDInsight 3.3 and HDInsight 3.4

%%configure

{ "packages":["com.databricks:spark-csv_2.10:1.4.0"] }

For HDInsight 3.5**

%%configure

{ "conf": {"spark.jars.packages": "com.databricks:spark-csv_2.10:1.4.0" }}

Note: If you forget to configure the kernel in the first cell, you can use the `%%configure` with the `-f` parameter, but that will restart the session and all progress will be lost.

For more details, see “Use external packages with Jupyter notebooks in Apache Spark clusters on HDInsight Linux”.

Regards,

Pradeep

Kindly click "Mark as Answer" on the post that helps you, this can be beneficial to other community members reading the thread and also “Vote as Helpful”.

_{Sunday, December 4, 2016 6:13 PM}

Hi,

I tried using your suggestion but I still receive this error

The code failed because of a fatal error: Session 1 unexpectedly reached final status 'dead'. See logs:
YARN Diagnostics:
[Fri Nov 25 16:54:41 +0000 2016] Application is added to the scheduler and is not yet activated. Queue's AM resource limit exceeded. Details : AM Partition = <DEFAULT_PARTITION>; AM Resource Request = <memory:4608, vCores:1>; Queue Resource Limit for AM = <memory:3072, vCores:1>; User AM Resource Limit of the queue = <memory:3072, vCores:1>; Queue AM Resource Usage = <memory:1536, vCores:1>; .

Some things to try:
a) Make sure Spark has enough available resources for Jupyter to create a Spark context. For instructions on how to assign resources see http://go.microsoft.com/fwlink/?LinkId=717038

b) Contact your cluster administrator to make sure the Spark magics library is configured correctly.

How can I solve this?

Roberto

_{Tuesday, December 6, 2016 5:35 PM | 1 vote}

From you second message, you are running out of resource. You can go to https://YOURCLUSTER/yarnui, you probably will see the Memory Used very close to Total Memory. Also, you might see your new application stuck in ACCEPTED state for a long time. To cleanup resources, you can find any application that is currently in RUNNING state and you would like to kill. Click on its ID column link, then on top left corner, click "kill application".

Thanks

_{Wednesday, April 26, 2017 12:52 PM}

I am also having the same issue - Spark HDInsight cluster 3.6; trying to import kafka packages for structured streaming. "%%configure
{"conf":{"spark.jars.packages":"org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0"}
}"

No other job is running on the cluster and it has 150Gb available memory.

Messages from the log:

WARN DefaultPromise: An exception was thrown by com.cloudera.livy.rsc.Utils$2.operationComplete()
java.util.concurrent.RejectedExecutionException: event executor terminatedERROR ApplicationMaster: User class threw exception: scala.reflect.internal.FatalError: object Predef does not have a member classOf
scala.reflect.internal.FatalError: object Predef does not have a member classOf
    at scala.reflect.internal.Definitions$DefinitionsClass.scala$reflect$internal$Definitions$DefinitionsClass$$fatalMissingSymbol(Definitions.scala:1186)

INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: scala.reflect.internal.FatalError: object Predef does not have a member classOf)
17/04/26 12:44:47 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: scala.reflect.internal.FatalError: object Predef does not have a member classOf)
17/04/26 12:44:47 ERROR Utils: Uncaught exception in thread pool-1-thread-1
org.apache.hadoop.yarn.exceptions.InvalidApplicationMasterRequestException: Application doesn't exist in cache appattempt_1493049485027_0013_000001The same dependency works file in Zeppelin NB. But the problem is I have to download the NB before i delete the cluster.

Thanks

Krish

_{Tuesday, May 2, 2017 11:04 PM}

I'm having this issue as well when running on HDInsight (Spark 2.1 on Linux (HDI 3.6))

It took a long time to get the jar config down. This happens after running the %%config magic to load the jar and set executor memory.

_{Tuesday, May 2, 2017 11:09 PM}

What version of Scala is running on HDInsight/Spark? Can't find this in the MS docs

I get this with one jar, but not an exceptionally simple (hello world-ish) one.

My simple case probably dpesn't use Predef, I'd be surpised if the other didn't. Scala's Predef object (there is no class) does have a classOf function. Perhaps there's a bug somewhere that's referring to it incorrectly.

Knowing the Scala version which Livy and spark are running might help narrow this down.

Last updated on 2016-11-23

Share via

Import External Jars when using Jupyter Notebook with Spark Kernel

Question

All replies (6)

Additional resources