Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Returns the position of the first substring in the str that match the Java regex regexp and corresponding to the regex group index.
For the corresponding Databricks SQL function, see regexp_instr function.
Syntax
from pyspark.databricks.sql import functions as dbf
dbf.regexp_instr(str=<str>, regexp=<regexp>, idx=<idx>)
Parameters
| Parameter | Type | Description |
|---|---|---|
str |
pyspark.sql.Column or str |
target column to work on. |
regexp |
pyspark.sql.Column or str |
regex pattern to apply. |
idx |
pyspark.sql.Column or int, optional |
matched group id. |
Examples
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([("1a 2b 14m", r"\d+(a|b|m)")], ["str", "regexp"])
df.select('*', dbf.regexp_instr('str', dbf.lit(r'\d+(a|b|m)'))).show()
df.select('*', dbf.regexp_instr('str', dbf.lit(r'\d+(a|b|m)'), dbf.lit(1))).show()
df.select('*', dbf.regexp_instr('str', dbf.col("regexp"))).show()
df.select('*', dbf.regexp_instr(dbf.col("str"), "regexp")).show()