What is semantic link?

Semantic link is a feature that allows you to establish a connection between semantic models and Synapse Data Science in Microsoft Fabric. Use of semantic link is only supported in Microsoft Fabric.

For Spark 3.4 and above, Semantic link is available in the default runtime when using Fabric, and there is no need to install it. If you are using Spark 3.3 or below, or if you want to update to the most recent version of Semantic Link, you can run the command:

%pip install -U semantic-link

The primary goals of semantic link are to facilitate data connectivity, enable the propagation of semantic information, and seamlessly integrate with established tools used by data scientists, such as notebooks. semantic link helps you to preserve domain knowledge about data semantics in a standardized way that can speed up data analysis and reduce errors.

The data flow starts with semantic models that contain data and semantic information. Semantic link bridges the gap between Power BI and the Data Science experience.

A diagram that shows data flow from Power BI to notebooks in Synapse Data Science and back to Power BI.

With semantic link, you can use semantic models from Power BI in the Data Science experience to perform tasks such as in-depth statistical analysis and predictive modeling with machine learning techniques. The output of your data science work can be stored in OneLake using Apache Spark and ingested into Power BI using Direct Lake.

Power BI connectivity

Semantic models serve as the single tabular object model, providing a reliable source for semantic definitions, such as Power BI measures. To connect to semantic models:

  • Semantic link offers data connectivity to the Python pandas ecosystem via the SemPy Python library, making it easy for data scientists to work with the data.
  • Semantic link provides access to semantic models through the Spark native connector for data scientists that are more familiar with the Apache Spark ecosystem. This implementation supports various languages, including PySpark, Spark SQL, R, and Scala.

Applications of semantic information

Semantic information in data includes Power BI data categories such as address and postal code, relationships between tables, and hierarchical information. These data categories comprise metadata that semantic link propagates into the Data Science environment to enable new experiences and maintain data lineage. Some example applications of semantic link are:

  • Intelligent suggestions of built-in semantic functions.
  • Innovative integration for augmenting data with Power BI measures through the use of add-measures.
  • Tools for data quality validation based on the relationships between tables and functional dependencies within tables.

Semantic link is a powerful tool that enables business analysts to use data effectively in a comprehensive data science environment. Semantic link facilitates seamless collaboration between data scientists and business analysts by eliminating the need to reimplement business logic embedded in Power BI measures. This approach ensures that both parties can work efficiently and productively, maximizing the potential of their data-driven insights.

FabricDataFrame data structure

FabricDataFrame is the core data structure of semantic link. It subclasses the pandas DataFrame and adds metadata, such as semantic information and lineage. FabricDataFrame is the primary data structure that semantic link uses to propagate semantic information from semantic models into the Data Science environment.

A diagram that shows data flow from connectors to semantic models to FabricDataFrame to Semantic Functions.

FabricDataFrame supports all pandas operations and more. It exposes semantic functions and the add-measure method that enable you to use Power BI measures in your data science work.