Kopīgot, izmantojot


from_csv

Parses a column containing a CSV string into a row with the specified schema. Returns null if the string cannot be parsed.

Syntax

from pyspark.sql import functions as sf

sf.from_csv(col, schema, options=None)

Parameters

Parameter Type Description
col pyspark.sql.Column or str A column or column name in CSV format.
schema pyspark.sql.Column or str A column, or Python string literal with schema in DDL format, to use when parsing the CSV column.
options dict, optional Options to control parsing. Accepts the same options as the CSV datasource.

Returns

pyspark.sql.Column: A column of parsed CSV values.

Examples

Example 1: Parsing a simple CSV string

from pyspark.sql import functions as sf
data = [("1,2,3",)]
df = spark.createDataFrame(data, ("value",))
df.select(sf.from_csv(df.value, "a INT, b INT, c INT")).show()
+---------------+
|from_csv(value)|
+---------------+
|      {1, 2, 3}|
+---------------+

Example 2: Using schema_of_csv to infer the schema

from pyspark.sql import functions as sf
data = [("1,2,3",)]
value = data[0][0]
df.select(sf.from_csv(df.value, sf.schema_of_csv(value))).show()
+---------------+
|from_csv(value)|
+---------------+
|      {1, 2, 3}|
+---------------+

Example 3: Ignoring leading white space in the CSV string

from pyspark.sql import functions as sf
data = [("   abc",)]
df = spark.createDataFrame(data, ("value",))
options = {'ignoreLeadingWhiteSpace': True}
df.select(sf.from_csv(df.value, "s string", options)).show()
+---------------+
|from_csv(value)|
+---------------+
|          {abc}|
+---------------+

Example 4: Parsing a CSV string with a missing value

from pyspark.sql import functions as sf
data = [("1,2,",)]
df = spark.createDataFrame(data, ("value",))
df.select(sf.from_csv(df.value, "a INT, b INT, c INT")).show()
+---------------+
|from_csv(value)|
+---------------+
|   {1, 2, NULL}|
+---------------+

Example 5: Parsing a CSV string with a different delimiter

from pyspark.sql import functions as sf
data = [("1;2;3",)]
df = spark.createDataFrame(data, ("value",))
options = {'delimiter': ';'}
df.select(sf.from_csv(df.value, "a INT, b INT, c INT", options)).show()
+---------------+
|from_csv(value)|
+---------------+
|      {1, 2, 3}|
+---------------+