from_xml

将包含 XML 字符串的列分析为具有指定架构的行。返回 null，在不可分析的字符串的情况下。

Syntax

from pyspark.sql import functions as sf

sf.from_xml(col, schema, options=None)

参数

参数	类型	Description
`col`	`pyspark.sql.Column` 或 str	XML 格式的列或列名。
`schema`	`StructType`或 `pyspark.sql.Column` str	结构类型、列或 Python 字符串文本，其格式为 DDL 的字符串在分析 Xml 列时要使用的字符串。
`options`	dict，可选	用于控制分析的选项。接受与 Xml 数据源相同的选项。

退货

pyspark.sql.Column：给定 XML 对象中复杂类型的新列。

例子

示例 1：使用 DDL 格式的字符串架构分析 XML

import pyspark.sql.functions as sf
data = [(1, '''<p><a>1</a></p>''')]
df = spark.createDataFrame(data, ("key", "value"))
# Define the schema using a DDL-formatted string
schema = "STRUCT<a: BIGINT>"
# Parse the XML column using the DDL-formatted schema
df.select(sf.from_xml(df.value, schema).alias("xml")).collect()

[Row(xml=Row(a=1))]

示例 2：使用 StructType 架构分析 XML

import pyspark.sql.functions as sf
from pyspark.sql.types import StructType, LongType
data = [(1, '''<p><a>1</a></p>''')]
df = spark.createDataFrame(data, ("key", "value"))
schema = StructType().add("a", LongType())
df.select(sf.from_xml(df.value, schema)).show()

+---------------+
|from_xml(value)|
+---------------+
|            {1}|
+---------------+

示例 3：分析架构中的 XML ArrayType

import pyspark.sql.functions as sf
data = [(1, '<p><a>1</a><a>2</a></p>')]
df = spark.createDataFrame(data, ("key", "value"))
# Define the schema with an Array type
schema = "STRUCT<a: ARRAY<BIGINT>>"
# Parse the XML column using the schema with an Array
df.select(sf.from_xml(df.value, schema).alias("xml")).collect()

[Row(xml=Row(a=[1, 2]))]

示例 4：使用 <a0/a0> 分析 XML

import pyspark.sql.functions as sf
# Sample data with an XML column
data = [(1, '<p><a>1</a><a>2</a></p>')]
df = spark.createDataFrame(data, ("key", "value"))
# Generate the schema from an example XML value
schema = sf.schema_of_xml(sf.lit(data[0][1]))
# Parse the XML column using the generated schema
df.select(sf.from_xml(df.value, schema).alias("xml")).collect()

[Row(xml=Row(a=[1, 2]))]

反馈

此页面是否有帮助？

Last updated on 2026-02-01

通过

from_xml

Syntax

参数

退货

例子

反馈

其他资源