Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Parses a XML string and infers its schema in DDL format.
Syntax
from pyspark.sql import functions as sf
sf.schema_of_xml(xml, options=None)
Parameters
| Parameter | Type | Description |
|---|---|---|
xml |
pyspark.sql.Column or str |
A XML string or a foldable string column containing a XML string. |
options |
dict, optional | Options to control parsing. Accepts the same options as the XML datasource. |
Returns
pyspark.sql.Column: a string representation of a StructType parsed from given XML.
Examples
Example 1: Parsing a simple XML with a single element
from pyspark.sql import functions as sf
df = spark.range(1)
df.select(sf.schema_of_xml(sf.lit('<p><a>1</a></p>')).alias("xml")).collect()
[Row(xml='STRUCT<a: BIGINT>')]
Example 2: Parsing an XML with multiple elements in an array
from pyspark.sql import functions as sf
df.select(sf.schema_of_xml(sf.lit('<p><a>1</a><a>2</a></p>')).alias("xml")).collect()
[Row(xml='STRUCT<a: ARRAY<BIGINT>>')]
Example 3: Parsing XML with options to exclude attributes
from pyspark.sql import functions as sf
schema = sf.schema_of_xml('<p><a attr="2">1</a></p>', {'excludeAttribute':'true'})
df.select(schema.alias("xml")).collect()
[Row(xml='STRUCT<a: BIGINT>')]
Example 4: Parsing XML with complex structure
from pyspark.sql import functions as sf
df.select(
sf.schema_of_xml(
sf.lit('<root><person><name>Alice</name><age>30</age></person></root>')
).alias("xml")
).collect()
[Row(xml='STRUCT<person: STRUCT<age: BIGINT, name: STRING>>')]
Example 5: Parsing XML with nested arrays
from pyspark.sql import functions as sf
df.select(
sf.schema_of_xml(
sf.lit('<data><values><value>1</value><value>2</value></values></data>')
).alias("xml")
).collect()
[Row(xml='STRUCT<values: STRUCT<value: ARRAY<BIGINT>>>')]