Share via


split

Splits str around matches of the given pattern.

For the corresponding Databricks SQL function, see split function.

Syntax

from pyspark.databricks.sql import functions as dbf

dbf.split(str=<str>, pattern=<pattern>, limit=<limit>)

Parameters

Parameter Type Description
str pyspark.sql.Column or str a string expression to split
pattern pyspark.sql.Column or literal string a string representing a regular expression. The regex string should be a Java regular expression. accepted as a regular expression representation, for backwards compatibility. In addition to int, limit now accepts column and column name.
limit pyspark.sql.Column or str or int an integer which controls the number of times pattern is applied. _ limit > 0: The resulting array's length will not be more than limit, and the resulting array's last entry will contain all input beyond the last matched pattern. _ limit <= 0: pattern will be applied as many times as possible, and the resulting array can be of any size.

Returns

pyspark.sql.Column: array of separated strings.

Examples

from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([('oneAtwoBthreeC',)], ['s',])
df.select('*', dbf.split(df.s, '[ABC]')).show()
df.select('*', dbf.split(df.s, '[ABC]', 2)).show()
df.select('*', dbf.split('s', '[ABC]', -2)).show()
df = spark.createDataFrame([
('oneAtwoBthreeC', '[ABC]', 2),
('1A2B3C', '[1-9]+', 1),
('aa2bb3cc4', '[1-9]+', -1)], ['s', 'p', 'l'])
df.select('*', dbf.split(df.s, df.p)).show()
df.select(dbf.split('s', df.p, 'l')).show()