inferSchema not working for json on Synapse

Ben Fisher 1 Reputation point
2022-08-29T19:19:16.103+00:00

I'm having an issue reading gzipped json files from ADLS in Azure Synapse using pyspark in a notebook on my workspace. I'm able to read individual files using pd.read_json(), but when running spark.read.json(url) it returns the error AnalysisException: Unable to infer schema for JSON. It must be specified manually.. I've tried spark.read.option("inferSchema", True).json(url) but get the same error. This is on multiple different files across my ADLS. I tested specifying a schema manually on dummy data I put into a json, and that only yielded an empty dataframe. Any suggestions on what else I can try to resolve this? Let me know if I can provide more context!

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,559 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
{count} votes

1 answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 38,546 Reputation points Microsoft Employee Moderator
    2022-09-01T08:21:52.777+00:00

    Hi @Ben Fisher ,

    Thank you for posting query in Microsoft Q&A Platform.

    Could you please share more details on what you mean by gzipped json files? Did you considered unzipping folder first and then try to take json from that unzipped path to read? If not, please try same.

    I tried to read data from json file in Synapse Notebook. Its working fine for me. Kindly check below.
    236881-image.png

    Code Used in above Image:

    %%pyspark  
    df = spark.read.load('abfss://******@storageName.dfs.core.windows.net/data/employees.json', format='json')  
    df.printSchema()  
    

    Hope this helps. Please let us know how it goes.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.