Share via


regexp_instr

Returns the position of the first substring in the str that match the Java regex regexp and corresponding to the regex group index.

For the corresponding Databricks SQL function, see regexp_instr function.

Syntax

from pyspark.databricks.sql import functions as dbf

dbf.regexp_instr(str=<str>, regexp=<regexp>, idx=<idx>)

Parameters

Parameter Type Description
str pyspark.sql.Column or str target column to work on.
regexp pyspark.sql.Column or str regex pattern to apply.
idx pyspark.sql.Column or int, optional matched group id.

Examples

from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([("1a 2b 14m", r"\d+(a|b|m)")], ["str", "regexp"])
df.select('*', dbf.regexp_instr('str', dbf.lit(r'\d+(a|b|m)'))).show()
df.select('*', dbf.regexp_instr('str', dbf.lit(r'\d+(a|b|m)'), dbf.lit(1))).show()
df.select('*', dbf.regexp_instr('str', dbf.col("regexp"))).show()
df.select('*', dbf.regexp_instr(dbf.col("str"), "regexp")).show()