Share via


to_json

Converts a column containing a StructType, ArrayType, MapType or a VariantType into a JSON string. Throws an exception, in the case of an unsupported type.

Syntax

from pyspark.sql import functions as sf

sf.to_json(col, options=None)

Parameters

Parameter Type Description
col pyspark.sql.Column or str Name of column containing a struct, an array, a map, or a variant object.
options dict, optional Options to control converting. Accepts the same options as the JSON datasource. Additionally the function supports the pretty option which enables pretty JSON generation.

Returns

pyspark.sql.Column: JSON object as string column.

Examples

Example 1: Converting a StructType column to JSON

import pyspark.sql.functions as sf
from pyspark.sql import Row
data = [(1, Row(age=2, name='Alice'))]
df = spark.createDataFrame(data, ("key", "value"))
df.select(sf.to_json(df.value).alias("json")).show(truncate=False)
+------------------------+
|json                    |
+------------------------+
|{"age":2,"name":"Alice"}|
+------------------------+

Example 2: Converting an ArrayType column to JSON

import pyspark.sql.functions as sf
from pyspark.sql import Row
data = [(1, [Row(age=2, name='Alice'), Row(age=3, name='Bob')])]
df = spark.createDataFrame(data, ("key", "value"))
df.select(sf.to_json(df.value).alias("json")).show(truncate=False)
+-------------------------------------------------+
|json                                             |
+-------------------------------------------------+
|[{"age":2,"name":"Alice"},{"age":3,"name":"Bob"}]|
+-------------------------------------------------+

Example 3: Converting a MapType column to JSON

import pyspark.sql.functions as sf
df = spark.createDataFrame([(1, {"name": "Alice"})], ("key", "value"))
df.select(sf.to_json(df.value).alias("json")).show(truncate=False)
+----------------+
|json            |
+----------------+
|{"name":"Alice"}|
+----------------+

Example 4: Converting a VariantType column to JSON

import pyspark.sql.functions as sf
df = spark.createDataFrame([(1, '{"name": "Alice"}')], ("key", "value"))
df.select(sf.to_json(sf.parse_json(df.value)).alias("json")).show(truncate=False)
+----------------+
|json            |
+----------------+
|{"name":"Alice"}|
+----------------+

Example 5: Converting a nested MapType column to JSON

import pyspark.sql.functions as sf
df = spark.createDataFrame([(1, [{"name": "Alice"}, {"name": "Bob"}])], ("key", "value"))
df.select(sf.to_json(df.value).alias("json")).show(truncate=False)
+---------------------------------+
|json                             |
+---------------------------------+
|[{"name":"Alice"},{"name":"Bob"}]|
+---------------------------------+

Example 6: Converting a simple ArrayType column to JSON

import pyspark.sql.functions as sf
df = spark.createDataFrame([(1, ["Alice", "Bob"])], ("key", "value"))
df.select(sf.to_json(df.value).alias("json")).show(truncate=False)
+---------------+
|json           |
+---------------+
|["Alice","Bob"]|
+---------------+

Example 7: Converting to JSON with specified options

import pyspark.sql.functions as sf
df = spark.sql("SELECT (DATE('2022-02-22'), 1) AS date")
json1 = sf.to_json(df.date)
json2 = sf.to_json(df.date, {"dateFormat": "yyyy/MM/dd"})
df.select("date", json1, json2).show(truncate=False)
+---------------+------------------------------+------------------------------+
|date           |to_json(date)                 |to_json(date)                 |
+---------------+------------------------------+------------------------------+
|{2022-02-22, 1}|{"col1":"2022-02-22","col2":1}|{"col1":"2022/02/22","col2":1}|
+---------------+------------------------------+------------------------------+