Semantic functions
This article covers semantic functions and how they can help data scientists and data engineers discover functions that are relevant to the FabricDataFrame or FabricSeries on which they're working.
For Spark 3.4 and above, Semantic link is available in the default runtime when using Fabric, and there is no need to install it. If you are using Spark 3.3 or below, or if you want to update to the most recent version of Semantic Link, you can run the command:
python %pip install -U semantic-link
FabricDataFrames dynamically expose semantic functions based on logic defined by each function.
For example, the is_holiday
function shows up in the autocomplete suggestions when you're working on a FabricDataFrame containing both a datetime column and a country column.
Each semantic function uses information about the data types, metadata (such as Power BI data categories), and the data in a FabricDataFrame or FabricSeries to determine its relevance to the particular data on which you're working.
Semantic functions are automatically discovered when annotated with the @semantic_function
decorator.
You can think of semantic functions as being similar to C# extension methods applied to the popular DataFrame concept.
Semantic function usage: autocomplete suggestions
Semantic functions are available in the autocomplete suggestions when you work with a FabricDataFrame or FabricSeries. You can use ctrl+space to trigger autocomplete.
In the following example, the metadata for the FabricDataFrame is manually specified.
from sempy.fabric import FabricDataFrame
df = FabricDataFrame(
{"country": ["US", "AT"],
"lat": [40.7128, 47.8095],
"long": [-74.0060, 13.0550]},
column_metadata={"lat": {"data_category": "Latitude"}, "long": {"data_category": "Longitude"}},
)
# Convert to GeoPandas dataframe
df_geo = df.to_geopandas(lat_col="lat", long_col="long")
# Use the explore function to visualize the data
df_geo.explore()
Alternatively, you could read from a semantic model into a FabricDataFrame, and in that case, the metadata will be autopopulated.
from sempy.fabric import FabricDataFrame
# Alternative: reading from semantic model
import sempy.fabric as fabric
df = fabric.read_table("my_dataset_name", "my_countries")
# Convert to GeoPandas dataframe
df_geo = df.to_geopandas(lat_col="lat", long_col="long")
# Use the explore function to visualize the data
df_geo.explore()
Built-in semantic functions
The SemPy Python library provides a set of built-in semantic functions that are available out of the box. A few examples are
is_holiday(...)
, which returnstrue
if the date is a holiday in the given country, using the holidays python package.to_geopandas(...)
, which converts a FabricDataFrame to a GeoPandas GeoDataFrame.parse_phonenumber(...)
, which parses a phone number into its components, using the phone numbers Python package.validators
, which performs data validation for common data types, such as email and credit card numbers. The semantic function uses the validators Python package.
Custom semantic functions
Semantic functions are built for extensibility.
You can define your own semantic functions within your notebook or as a separate Python module.
To use a semantic function outside of a notebook, the semantic function needs to be declared within the sempy.functions
module.
This code example shows the definition of a semantic function _is_captial
that returns true
if the city is a capital of the country.
from sempy.fabric import FabricDataFrame, FabricSeries
from sempy.fabric.matcher import CountryMatcher, CityMatcher
from sempy.functions import semantic_function, semantic_paramters
@semantic_function("is_capital")
@semantic_parameters(col_country=CountryMatcher, col_city=CityMatcher)
def _is_captial(df: FabricDataFrame, col_country: str, col_city: str) -> FabricSeries:
"""Returns true if the city is a capital of the country"""
capitals = {
"US": ["Washington"],
"AT": ["Vienna"],
# ...
}
return df[[col_country, col_city]] \
.apply(lambda row: row[1] in capitals[row[0]], axis=1)
The following points provide a breakdown of the code snippet:
- The
col_country
andcol_city
parameters are annotated withCountryMatcher
andCityMatcher
, respectively. This annotation allows the semantic function to be automatically discovered when working with a FabricDataFrame that has the corresponding metadata. - Calling the function also supplies standard data types such as
str
,int
,float
, anddatetime
to define required input columns. - The type annotation of the first parameter (
df
) shows that the function is applicable to a FabricDataFrame, rather than a FabricSeries.
Related content
الملاحظات
https://aka.ms/ContentUserFeedback.
قريبًا: خلال عام 2024، سنتخلص تدريجيًا من GitHub Issues بوصفها آلية إرسال ملاحظات للمحتوى ونستبدلها بنظام ملاحظات جديد. لمزيد من المعلومات، راجعإرسال الملاحظات وعرضها المتعلقة بـ