st_collect

Applies to: check marked yes Databricks Runtime 18 LTS and above

Important

This feature is in Public Preview.

Collects an array of Geography or Geometry values into a single multipoint, multilinestring, multipolygon, or geometry collection.

For the corresponding Databricks SQL function, see st_collect function.

Syntax

from pyspark.databricks.sql import functions as dbf

dbf.st_collect(col=<col>)

Parameters

Parameter Type Description
col pyspark.sql.Column or str An array of Geography values, or an array of Geometry values.

Returns

pyspark.sql.Column: A Geography or Geometry value, representing a multipoint, multilinestring, multipolygon, or geometry collection.

Any None values in the input array are ignored. The type of the output depends on the types of the non-None input geometries:

  • If all non-None elements are points, returns a multipoint.
  • If all non-None elements are linestrings, returns a multilinestring.
  • If all non-None elements are polygons, returns a multipolygon.
  • Otherwise, returns a geometry collection.

Each output contains one element per non-None array element.

Multi-typed inputs (multipoint, multilinestring, multipolygon) and geometry collection inputs are preserved as elements of the resulting geometry collection; they are not flattened.

The SRID value of the output is the common SRID value of the non-None input geometries.

The dimension of the output is the maximum common dimension of the non-None input geometries.

If the input array is empty or contains only None values, the 2D empty geometry collection is returned. In this case, the SRID of the output is determined as follows:

  • If the input array's element type is GEOGRAPHY(ANY), the SRID of the output is 4326.
  • If the input array's element type is GEOMETRY(ANY), the SRID of the output is 0.
  • Otherwise, the SRID of the output is that of the input array's element type.

If any two non-None input geometries have different SRID values, the function raises a ST_DIFFERENT_SRID_VALUES error.

The function returns None if the input is None.

Examples

Collects an array of points into a multipoint.

from pyspark.databricks.sql import functions as dbf
from pyspark.sql import functions as sf
df = spark.createDataFrame([('POINT(1 2)', 'POINT(3 4)')], ['wkt1', 'wkt2'])
df.select(dbf.st_astext(dbf.st_collect(sf.array(dbf.st_geomfromtext('wkt1'), dbf.st_geomfromtext('wkt2')))).alias('result')).collect()
[Row(result='MULTIPOINT((1 2),(3 4))')]

Collects an array of polygons into a multipolygon.

from pyspark.databricks.sql import functions as dbf
from pyspark.sql import functions as sf
df = spark.createDataFrame([('POLYGON((0 0,10 0,10 10,0 10,0 0))',)], ['wkt'])
df.select(dbf.st_astext(dbf.st_collect(sf.array(dbf.st_geomfromtext('wkt')))).alias('result')).collect()
[Row(result='MULTIPOLYGON(((0 0,10 0,10 10,0 10,0 0)))')]

Collects an array of mixed geometry kinds into a geometry collection.

from pyspark.databricks.sql import functions as dbf
from pyspark.sql import functions as sf
df = spark.createDataFrame([('POLYGON((0 0,10 0,10 10,0 10,0 0))', 'LINESTRING(1 2,3 4)')], ['wkt1', 'wkt2'])
df.select(dbf.st_astext(dbf.st_collect(sf.array(dbf.st_geomfromtext('wkt1'), dbf.st_geomfromtext('wkt2')))).alias('result')).collect()
[Row(result='GEOMETRYCOLLECTION(POLYGON((0 0,10 0,10 10,0 10,0 0)),LINESTRING(1 2,3 4))')]

Returns the 2D empty geometry collection for an empty input array.

from pyspark.databricks.sql import functions as dbf
from pyspark.sql import functions as sf
df = spark.range(1)
df.select(dbf.st_astext(dbf.st_collect(sf.array())).alias('result')).collect()
[Row(result='GEOMETRYCOLLECTION EMPTY')]

Returns None for a None input.

from pyspark.databricks.sql import functions as dbf
from pyspark.sql import functions as sf
df = spark.range(1)
df.select(dbf.st_collect(sf.lit(None)).alias('result')).collect()
[Row(result=None)]