Transform table results to json in azure databricks

Raj D 586 Reputation points
2020-08-07T18:46:26.077+00:00

Hi, I am working on a data transformation of sql table results to a json string and save them as json documents. Stuck with how to proceed from here. I can query sale but not being able to create a json string of the table data and eventually save as a json document.

    %scala
    val sale= spark.read.jdbc(jdbcUrl, "sales.Transaction", connectionProperties)
    sale.createOrReplaceTempView("sale")
    saleDF = spark.sql("SELECT * FROM sale LIMIT 10").coalesce(1).write.format("org.apache.spark.sql.json")
    saleDF.show() // dispaly json string throws error: value show is not a member of org.apache.spark.sql.DataFrameWriter[org.apache.spark.sql.Row]
spark.sql("SELECT * FROM sale LIMIT 10").coalesce(1).write.format("org.apache.spark.sql.json").mode("overwrite").save(<Blob Path1/ ADLS Path1/sale.json>) // to save json document to ADLS error: unclosed block

Thank you

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,091 questions
0 comments No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 86,131 Reputation points Microsoft Employee
    2020-08-12T04:04:32.573+00:00

    Hello @Raj D ,

    You will experience this error message when your Storage account Firewalls and virtual networks are set to "selected networks".

    17171-image.png

    Error: Which you are experiencing

    17050-image.png

    To resolve this issue: There are couple of solutions

    Try to change the storage account => Firewalls and virtual networks to "All networks".

    17162-image.png

    Successful: After changes the firewall rules it started working.

    17055-image.png

    OR

    Since, Azure Databricks does not count as a trusted Microsoft service, you could see the supported trusted Microsoft services with the storage account firewall.

    You need to Deploy Azure Databricks in your Azure virtual network (VNet injection).

    Hope this helps. Do let us know if you any further queries.

    ----------------------------------------------------------------------------------------

    Do click on "Accept Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 86,131 Reputation points Microsoft Employee
    2020-08-10T11:27:06.07+00:00

    Hello @Raj D ,

    Welcome to the Microsoft Q&A platform.

    You are experiencing this error message “value show is not a member of org.apache.spark.sql.DataFrameWriter[org.apache.spark.sql.Row] saleDF.show()”, because it expecting the path of the json file to read the data from.

    You don't create a JSON string before you save; Spark takes care of writing it out to JSON at the point of saving. Until such time as you actually write it to storage, the DataFrame is a logical construction without a public physical representation.

    The write property returns a DataFrameWriter, not a DataFrame. (and a writer doesn't have the show method, only an actual DF does)

    Also also, the terminal methods of a DataFrameWriter chain (e.g. json) don't return anything. Their purpose is to have the side effect of performing the actual write operation to storage, not to return a new DataFrame.

    16821-image.png

    Below is the code snippet for writing sql table result to JSON to Databricks File System in an Azure Databricks Notebook.

    spark.sql("SELECT * FROM customers").coalesce(1).write.format("org.apache.spark.sql.json").mode("overwrite").save("dbfs:/myfolder/myfile.json")  
    

    For Blob Storage: You need to configure the Storage path.

    spark.conf.set("fs.azure.account.key.chepra.blob.core.windows.net", "gv7nVxxxxxxxxxxxxxxxxxxxxxOiA==")  
    val output_container_path = "wasbs://sampledata@chepra.blob.core.windows.net/Json_data_folder"  
    

    Below is the code snippet for writing sql table result to JSON to an Azure blob storage container in an Azure Databricks Notebook.

    spark.sql("SELECT * FROM customers").coalesce(1).write.format("org.apache.spark.sql.json").mode("overwrite").save(output_container_path )  
    

    16763-image.png

    Successfully transformed table results to json in azure databricks.

    16775-image.png

    Hope this helps. Do let us know if you any further queries.

    ----------------------------------------------------------------------------------------

    Do click on "Accept Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

    1 person found this answer helpful.