Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Parses a column containing a CSV string into a row with the specified schema. Returns null if the string cannot be parsed.
Syntax
from pyspark.sql import functions as sf
sf.from_csv(col, schema, options=None)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or str |
A column or column name in CSV format. |
schema |
pyspark.sql.Column or str |
A column, or Python string literal with schema in DDL format, to use when parsing the CSV column. |
options |
dict, optional | Options to control parsing. Accepts the same options as the CSV datasource. |
Returns
pyspark.sql.Column: A column of parsed CSV values.
Examples
Example 1: Parsing a simple CSV string
from pyspark.sql import functions as sf
data = [("1,2,3",)]
df = spark.createDataFrame(data, ("value",))
df.select(sf.from_csv(df.value, "a INT, b INT, c INT")).show()
+---------------+
|from_csv(value)|
+---------------+
| {1, 2, 3}|
+---------------+
Example 2: Using schema_of_csv to infer the schema
from pyspark.sql import functions as sf
data = [("1,2,3",)]
value = data[0][0]
df.select(sf.from_csv(df.value, sf.schema_of_csv(value))).show()
+---------------+
|from_csv(value)|
+---------------+
| {1, 2, 3}|
+---------------+
Example 3: Ignoring leading white space in the CSV string
from pyspark.sql import functions as sf
data = [(" abc",)]
df = spark.createDataFrame(data, ("value",))
options = {'ignoreLeadingWhiteSpace': True}
df.select(sf.from_csv(df.value, "s string", options)).show()
+---------------+
|from_csv(value)|
+---------------+
| {abc}|
+---------------+
Example 4: Parsing a CSV string with a missing value
from pyspark.sql import functions as sf
data = [("1,2,",)]
df = spark.createDataFrame(data, ("value",))
df.select(sf.from_csv(df.value, "a INT, b INT, c INT")).show()
+---------------+
|from_csv(value)|
+---------------+
| {1, 2, NULL}|
+---------------+
Example 5: Parsing a CSV string with a different delimiter
from pyspark.sql import functions as sf
data = [("1;2;3",)]
df = spark.createDataFrame(data, ("value",))
options = {'delimiter': ';'}
df.select(sf.from_csv(df.value, "a INT, b INT, c INT", options)).show()
+---------------+
|from_csv(value)|
+---------------+
| {1, 2, 3}|
+---------------+