Unable to write to XML file in Synapse Notebook

Question

Hi,

I am able to read an excel present in my ADLS Gen2. However, I am unable to write to the same location. Please find the code snippet below.

from pyspark.sql import SparkSession

from pyspark.sql.types import *

spark = SparkSession.builder.getOrCreate()

customSchema = StructType([

StructField("_id", StringType(), True),

StructField("author", StringType(), True),

StructField("description", StringType(), True),

StructField("genre", StringType(), True),

StructField("price", DoubleType(), True),

StructField("publish_date", StringType(), True),

StructField("title", StringType(), True)])

df = spark.read \

.format('xml') \

.options(rowTag='book') \

.load('/books.xml', schema = customSchema)

This part executes correctly. However, the below seems to be an issue.

df.select("author", "_id").write \

.format('xml') \

.options(rowTag='book', rootTag='books') \

.save('/newbooks.xml')

Error Message:

Py4JJavaError: An error occurred while calling o4276.save.

Answer

Hi @Prafulla Tej

Thank you for reaching out to the Azure community forum.

To write to an XML file in Synapse Notebook, you can use the com.databricks.spark.xml package. This package is not a built-in package in Synapse Notebook, but it can be installed using the following command:

%pip install com.databricks:spark-xml_2.11:0.4.1

After installing the package, you can use the com.databricks.spark.xml package to write to XML files in Synapse Notebook. Here's an example code snippet that you can use to write a DataFrame to an XML file:

df.write \
  .format('com.databricks.spark.xml') \
  .options(rowTag='book', rootTag='books') \
  .save('/newbooks.xml')

Make sure to replace with the correct path to the ADLS Gen2 location.

If you're still facing issues while writing to the XML file, please ensure that you have the necessary permissions to write to the ADLS Gen2 location, the path provided in the write operation is correct and accessible, and the schema of the XML file is correct and matches the schema of the DataFrame.

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

Unable to write to XML file in Synapse Notebook

1 answer

Your answer