Σημείωση
Η πρόσβαση σε αυτή τη σελίδα απαιτεί εξουσιοδότηση. Μπορείτε να δοκιμάσετε να συνδεθείτε ή να αλλάξετε καταλόγους.
Η πρόσβαση σε αυτή τη σελίδα απαιτεί εξουσιοδότηση. Μπορείτε να δοκιμάσετε να αλλάξετε καταλόγους.
Splits a string into arrays of sentences, where each sentence is an array of words.
The language and country arguments are optional. When they are omitted:
- If they are both omitted, the
Locale.ROOT - locale(language='', country='')is used. TheLocale.ROOTis regarded as the base locale of all locales, and is used as the language/country neutral locale for the locale sensitive operations. - If the
countryis omitted, thelocale(language, country='')is used.
When they are null:
- If they are both
null, theLocale.US - locale(language='en', country='US')is used. - If the
languageis null and thecountryis not null, theLocale.US - locale(language='en', country='US')is used. - If the
languageis not null and thecountryis null, thelocale(language)is used. - If neither is
null, thelocale(language, country)is used.
For the corresponding Databricks SQL function, see sentences function.
Syntax
from pyspark.databricks.sql import functions as dbf
dbf.sentences(string=<string>, language=<language>, country=<country>)
Parameters
| Parameter | Type | Description |
|---|---|---|
string |
pyspark.sql.Column or str |
a string to be split |
language |
pyspark.sql.Column or str, optional |
a language of the locale |
country |
pyspark.sql.Column or str, optional |
a country of the locale |
Returns
pyspark.sql.Column: arrays of split sentences.
Examples
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([("This is an example sentence.", )], ["s"])
df.select("*", dbf.sentences(df.s, dbf.lit("en"), dbf.lit("US"))).show(truncate=False)
df.select("*", dbf.sentences(df.s, dbf.lit("en"))).show(truncate=False)
df.select("*", dbf.sentences(df.s)).show(truncate=False)