Unable to perform spark.sql in Synapse notebook

Austin Schafer 96 Reputation points
2021-07-21T17:53:22.367+00:00

Hello,

I am unable to run a simple spark.sql() (ex. df = spark.sql("SELECT * FROM table1")) in Synapse notebooks. I am able to load and view the file without using SQL, but when using spark.sql() I receive errors for all files including csv and parquet file types.

I have tried different sized clusters, restarting clusters, spark versions, and changing the language and code from PySpark to Scala. My workspace has permission to access my data in ADLS Gen 2. Apologies if this question has already been answered elsewhere. Below is the error I am receiving.

AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException;
Traceback (most recent call last):

File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 767, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)

File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call
answer, self.gateway_client, self.target_id, self.name)

File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 75, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)

pyspark.sql.utils.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException;

Thanks

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,897 questions
{count} votes

Accepted answer
  1. Austin Schafer 96 Reputation points
    2021-08-04T17:52:51.997+00:00

    Posting the solution I was given after contacting support:

    This is a bug that sometimes occurs when the workspace is created. After I created a new workspace and ran the same commands, the code worked great.


2 additional answers

Sort by: Most helpful
  1. Ryan Abbey 1,181 Reputation points
    2021-07-21T21:55:11.467+00:00

    how are you loading in PySpark? Via the forPath? Did you do a "saveAsTable" on creation? (or any subsequent table creation command)?


  2. Computer Mike 86 Reputation points
    2022-02-04T17:44:54.443+00:00

    I changed the code in cell 33..

    # Write data to a new managed catalog table.
    
    ## old...... data.write.format("delta").saveAsTable("ManagedDeltaTable")
    
    ##new
    data.write.format("delta").mode("overwrite").saveAsTable("ManagedDeltaTable")
    
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.