Unable to write to XML file in Synapse Notebook

Prafulla Tej 25 Reputation points
2024-03-06T06:52:20.3766667+00:00

Hi,

I am able to read an excel present in my ADLS Gen2. However, I am unable to write to the same location. Please find the code snippet below.

from pyspark.sql import SparkSession

from pyspark.sql.types import *

spark = SparkSession.builder.getOrCreate()

customSchema = StructType([

    StructField("_id", StringType(), True),

    StructField("author", StringType(), True),

    StructField("description", StringType(), True),

    StructField("genre", StringType(), True),

    StructField("price", DoubleType(), True),

    StructField("publish_date", StringType(), True),

    StructField("title", StringType(), True)])

df = spark.read \

    .format('xml') \

    .options(rowTag='book') \

    .load('<PathofADLSGen2>/books.xml', schema = customSchema)

This part executes correctly. However, the below seems to be an issue.

df.select("author", "_id").write \

    .format('xml') \

    .options(rowTag='book', rootTag='books') \

    .save('<PathofADLSGen2>/newbooks.xml')

Error Message:

Py4JJavaError: An error occurred while calling o4276.save.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,868 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Smaran Thoomu 15,110 Reputation points Microsoft Vendor
    2024-03-06T11:15:43.63+00:00

    Hi @Prafulla Tej

    Thank you for reaching out to the Azure community forum.

    To write to an XML file in Synapse Notebook, you can use the com.databricks.spark.xml package. This package is not a built-in package in Synapse Notebook, but it can be installed using the following command:

    %pip install com.databricks:spark-xml_2.11:0.4.1
    

    After installing the package, you can use the com.databricks.spark.xml package to write to XML files in Synapse Notebook. Here's an example code snippet that you can use to write a DataFrame to an XML file:

    df.write \
      .format('com.databricks.spark.xml') \
      .options(rowTag='book', rootTag='books') \
      .save('<PathofADLSGen2>/newbooks.xml')
    

    Make sure to replace <PathofADLSGen2> with the correct path to the ADLS Gen2 location.

    If you're still facing issues while writing to the XML file, please ensure that you have the necessary permissions to write to the ADLS Gen2 location, the path provided in the write operation is correct and accessible, and the schema of the XML file is correct and matches the schema of the DataFrame.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.