Py4JJavaError: An error occurred while calling

Abhishek Gaikwad 191 Reputation points
2022-01-14T12:11:31.67+00:00

I am running notebook which works when called separately from a databricks cluster. However when i use a job cluster I get below error. Any suggestion to fix this issue.

OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
ANTLR Tool version 4.7 used for code generation does not match the current runtime version 4.8ANTLR Tool version 4.7 used for code generation does not match the current runtime version 4.8ANTLR Tool version 4.7 used for code generation does not match the current runtime version 4.8ANTLR Tool version 4.7 used for code generation does not match the current runtime version 4.8Fri Jan 14 11:49:30 2022 py4j imported
Fri Jan 14 11:49:30 2022 Python shell started with PID 978 and guid 74d5505fa9a54f218d5142697cc8dc4c
Fri Jan 14 11:49:30 2022 Initialized gateway on port 39921
Fri Jan 14 11:49:31 2022 Python shell executor start
Fri Jan 14 11:50:26 2022 py4j imported
Fri Jan 14 11:50:26 2022 Python shell started with PID 2258 and guid 74b9c73a38b242b682412b765e7dfdbd
Fri Jan 14 11:50:26 2022 Initialized gateway on port 33301
Fri Jan 14 11:50:27 2022 Python shell executor start

Hive Session ID = 66b42549-7f0f-46a3-b314-85d3957d9745

KeyError Traceback (most recent call last)
<command-2748591378350644> in <module>
2 cu_pdf = count_unique(df).to_koalas().rename(index={0: 'unique_count'})
3 cn_pdf = count_null(df).to_koalas().rename(index={0: 'null_count'})
----> 4 dt_pdf = dtypes_desc(df)
5 cna_pdf = count_na(df).to_koalas().rename(index={0: 'NA_count'})
6 distinct_pdf = distinct_count(df).set_index("Column_Name").T

<command-1553327259583875> in dtypes_desc(spark_df)
66 #calculates data types for all columns in a spark df and returns a koalas df
67 def dtypes_desc(spark_df):
---> 68 df = ks.DataFrame(spark_df.dtypes).set_index(['0']).T.rename(index={'1': 'data_type'})
69 return df
70

/databricks/python/lib/python3.8/site-packages/databricks/koalas/usage_logging/init.py in wrapper(*args, **kwargs)
193 start = time.perf_counter()
194 try:
--> 195 res = func(*args, **kwargs)
196 logger.log_success(
197 class_name, function_name, time.perf_counter() - start, signature

/databricks/python/lib/python3.8/site-packages/databricks/koalas/frame.py in set_index(self, keys, drop, append, inplace)
3588 for key in keys:
3589 if key not in columns:
-> 3590 raise KeyError(name_like_string(key))
3591
3592 if drop:

KeyError: '0'---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<command-36984414021830> in <module>
----> 1 dbutils.notebook.run("/Shared/notbook1", 0, {"Database_Name" : "Source", "Table_Name" : "t_A" ,"Job_User": Loaded_By })

/databricks/python_shell/dbruntime/dbutils.py in run(self, path, timeout_seconds, arguments, _NotebookHandler__databricks_internal_cluster_spec)
134 arguments = {},
135 __databricks_internal_cluster_spec = None):
--> 136 return self.entry_point.getDbutils().notebook()._run(
137 path,
138 timeout_seconds,

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args)
1302
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1306

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
115 def deco(*a, **kw):
116 try:
--> 117 return f(*a, **kw)
118 except py4j.protocol.Py4JJavaError as e:
119 converted = convert_exception(e.java_exception)

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)

Py4JJavaError: An error occurred while calling o562._run.
: com.databricks.WorkflowException: com.databricks.NotebookExecutionException: FAILED
at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:71)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl.run(NotebookUtilsImpl.scala:122)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl._run(NotebookUtilsImpl.scala:89)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.databricks.NotebookExecutionException: FAILED
at com.databricks.workflow.WorkflowDriver.run0(WorkflowDriver.scala:117)
at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:66)
... 13 more

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,080 questions
{count} votes