Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Computes the Levenshtein distance of the two given strings.
For the corresponding Databricks SQL function, see levenshtein function.
Syntax
from pyspark.databricks.sql import functions as dbf
dbf.levenshtein(left=<left>, right=<right>, threshold=<threshold>)
Parameters
| Parameter | Type | Description |
|---|---|---|
left |
pyspark.sql.Column or str |
First column value. |
right |
pyspark.sql.Column or str |
Second column value. |
threshold |
int, optional |
If set when the levenshtein distance of the two given strings less than or equal to a given threshold then return result distance, or -1 |
Returns
pyspark.sql.Column: Levenshtein distance as integer value.
Examples
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([('kitten', 'sitting',)], ['l', 'r'])
df.select('*', dbf.levenshtein('l', 'r')).show()
df.select('*', dbf.levenshtein(df.l, df.r, 2)).show()