Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Parses a CSV string and infers its schema in DDL format.
Syntax
from pyspark.sql import functions as sf
sf.schema_of_csv(csv, options=None)
Parameters
| Parameter | Type | Description |
|---|---|---|
csv |
pyspark.sql.Column or str |
A CSV string or a foldable string column containing a CSV string. |
options |
dict, optional | Options to control parsing. Accepts the same options as the CSV datasource. |
Returns
pyspark.sql.Column: A string representation of a StructType parsed from the given CSV.
Examples
Example 1: Inferring the schema of a CSV string with different data types
from pyspark.sql import functions as sf
df = spark.range(1)
df.select(sf.schema_of_csv(sf.lit('1|a|true'), {'sep':'|'})).show(truncate=False)
+-------------------------------------------+
|schema_of_csv(1|a|true) |
+-------------------------------------------+
|STRUCT<_c0: INT, _c1: STRING, _c2: BOOLEAN>|
+-------------------------------------------+
Example 2: Inferring the schema of a CSV string with missing values
from pyspark.sql import functions as sf
df = spark.range(1)
df.select(sf.schema_of_csv(sf.lit('1||true'), {'sep':'|'})).show(truncate=False)
+-------------------------------------------+
|schema_of_csv(1||true) |
+-------------------------------------------+
|STRUCT<_c0: INT, _c1: STRING, _c2: BOOLEAN>|
+-------------------------------------------+
Example 3: Inferring the schema of a CSV string with a different delimiter
from pyspark.sql import functions as sf
df = spark.range(1)
df.select(sf.schema_of_csv(sf.lit('1;a;true'), {'sep':';'})).show(truncate=False)
+-------------------------------------------+
|schema_of_csv(1;a;true) |
+-------------------------------------------+
|STRUCT<_c0: INT, _c1: STRING, _c2: BOOLEAN>|
+-------------------------------------------+
Example 4: Inferring the schema of a CSV string with quoted fields
from pyspark.sql import functions as sf
df = spark.range(1)
df.select(sf.schema_of_csv(sf.lit('"1","a","true"'), {'sep':','})).show(truncate=False)
+-------------------------------------------+
|schema_of_csv("1","a","true") |
+-------------------------------------------+
|STRUCT<_c0: INT, _c1: STRING, _c2: BOOLEAN>|
+-------------------------------------------+