Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Splits str around matches of the given pattern.
For the corresponding Databricks SQL function, see split function.
Syntax
from pyspark.databricks.sql import functions as dbf
dbf.split(str=<str>, pattern=<pattern>, limit=<limit>)
Parameters
| Parameter | Type | Description |
|---|---|---|
str |
pyspark.sql.Column or str |
a string expression to split |
pattern |
pyspark.sql.Column or literal string |
a string representing a regular expression. The regex string should be a Java regular expression. accepted as a regular expression representation, for backwards compatibility. In addition to int, limit now accepts column and column name. |
limit |
pyspark.sql.Column or str or int |
an integer which controls the number of times pattern is applied. _ limit > 0: The resulting array's length will not be more than limit, and the resulting array's last entry will contain all input beyond the last matched pattern. _ limit <= 0: pattern will be applied as many times as possible, and the resulting array can be of any size. |
Returns
pyspark.sql.Column: array of separated strings.
Examples
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([('oneAtwoBthreeC',)], ['s',])
df.select('*', dbf.split(df.s, '[ABC]')).show()
df.select('*', dbf.split(df.s, '[ABC]', 2)).show()
df.select('*', dbf.split('s', '[ABC]', -2)).show()
df = spark.createDataFrame([
('oneAtwoBthreeC', '[ABC]', 2),
('1A2B3C', '[1-9]+', 1),
('aa2bb3cc4', '[1-9]+', -1)], ['s', 'p', 'l'])
df.select('*', dbf.split(df.s, df.p)).show()
df.select(dbf.split('s', df.p, 'l')).show()