Py4JJavaError: An error occurred while calling

Question

Py4JJavaError: An error occurred while calling

Abhishek Gaikwad 196

I am running notebook which works when called separately from a databricks cluster. However when i use a job cluster I get below error. Any suggestion to fix this issue.

OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
ANTLR Tool version 4.7 used for code generation does not match the current runtime version 4.8ANTLR Tool version 4.7 used for code generation does not match the current runtime version 4.8ANTLR Tool version 4.7 used for code generation does not match the current runtime version 4.8ANTLR Tool version 4.7 used for code generation does not match the current runtime version 4.8Fri Jan 14 11:49:30 2022 py4j imported
Fri Jan 14 11:49:30 2022 Python shell started with PID 978 and guid 74d5505fa9a54f218d5142697cc8dc4c
Fri Jan 14 11:49:30 2022 Initialized gateway on port 39921
Fri Jan 14 11:49:31 2022 Python shell executor start
Fri Jan 14 11:50:26 2022 py4j imported
Fri Jan 14 11:50:26 2022 Python shell started with PID 2258 and guid 74b9c73a38b242b682412b765e7dfdbd
Fri Jan 14 11:50:26 2022 Initialized gateway on port 33301
Fri Jan 14 11:50:27 2022 Python shell executor start

Hive Session ID = 66b42549-7f0f-46a3-b314-85d3957d9745

KeyError Traceback (most recent call last)
<command-2748591378350644> in <module>
2 cu_pdf = count_unique(df).to_koalas().rename(index={0: 'unique_count'})
3 cn_pdf = count_null(df).to_koalas().rename(index={0: 'null_count'})
----> 4 dt_pdf = dtypes_desc(df)
5 cna_pdf = count_na(df).to_koalas().rename(index={0: 'NA_count'})
6 distinct_pdf = distinct_count(df).set_index("Column_Name").T

<command-1553327259583875> in dtypes_desc(spark_df)
66 #calculates data types for all columns in a spark df and returns a koalas df
67 def dtypes_desc(spark_df):
---> 68 df = ks.DataFrame(spark_df.dtypes).set_index(['0']).T.rename(index={'1': 'data_type'})
69 return df
70

/databricks/python/lib/python3.8/site-packages/databricks/koalas/usage_logging/init.py in wrapper(*args, **kwargs)
193 start = time.perf_counter()
194 try:
--> 195 res = func(*args, **kwargs)
196 logger.log_success(
197 class_name, function_name, time.perf_counter() - start, signature

/databricks/python/lib/python3.8/site-packages/databricks/koalas/frame.py in set_index(self, keys, drop, append, inplace)
3588 for key in keys:
3589 if key not in columns:
-> 3590 raise KeyError(name_like_string(key))
3591
3592 if drop:

KeyError: '0'---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<command-36984414021830> in <module>
----> 1 dbutils.notebook.run("/Shared/notbook1", 0, {"Database_Name" : "Source", "Table_Name" : "t_A" ,"Job_User": Loaded_By })

/databricks/python_shell/dbruntime/dbutils.py in run(self, path, timeout_seconds, arguments, _NotebookHandler__databricks_internal_cluster_spec)
134 arguments = {},
135 __databricks_internal_cluster_spec = None):
--> 136 return self.entry_point.getDbutils().notebook()._run(
137 path,
138 timeout_seconds,

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args)
1302
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1306

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
115 def deco(*a, **kw):
116 try:
--> 117 return f(*a, **kw)
118 except py4j.protocol.Py4JJavaError as e:
119 converted = convert_exception(e.java_exception)

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)

Py4JJavaError: An error occurred while calling o562._run.
: com.databricks.WorkflowException: com.databricks.NotebookExecutionException: FAILED
at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:71)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl.run(NotebookUtilsImpl.scala:122)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl._run(NotebookUtilsImpl.scala:89)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.databricks.NotebookExecutionException: FAILED
at com.databricks.workflow.WorkflowDriver.run0(WorkflowDriver.scala:117)
at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:66)
... 13 more

svijay-MSFT 5,256 Reputation points Microsoft Employee Moderator

2022-01-18T12:57:45.073+00:00

Hello @Abhishek Gaikwad ,

Welcome to the Microsoft Q&A platform.

Are you any doing memory intensive operation - like collect() / doing large amount of data manipulation using dataframe ?
Abhishek Gaikwad 196 Reputation points

2022-01-18T13:22:04.78+00:00

I am trying to call multiple tables and run data quality script in python against those tables.
svijay-MSFT 5,256 Reputation points Microsoft Employee Moderator

2022-01-25T14:28:26.44+00:00

Hello @Abhishek Gaikwad ,

The error usually occurs when there is memory intensive operation and there is less memory. Since you are calling multiple tables and run data quality script - this is a memory intensive operation.