Semantic data propagation from semantic models

When you read a semantic model into a FabricDataFrame, semantic information such as metadata and annotations from the semantic model are automatically attached to the FabricDataFrame. In this article, you'll learn how the SemPy python library preserves annotations that are attached to your semantic model's tables and columns.

Semantic propagation for pandas users

The SemPy Python library is part of the semantic link feature and serves pandas users. SemPy supports the operations that pandas allows you to perform on your data. Furthermore, SemPy allows you to propagate semantic data from semantic models on which you're operating. By propagating semantic data, you can preserve annotations that are attached to tables and columns in the semantic model when you perform operations such as slicing, merges, and concatenation.

You can create a FabricDataFrame data structure one of two ways: you can read a table or the output of a measure from a semantic model into a FabricDataFrame. Alternatively, you can use in-memory data to create the FabricDataFrame, just like you do for pandas DataFrames.

  • When you read from a semantic model into a FabricDataFrame, the metadata from Power BI automatically hydrates the FabricDataFrame. In other words, the semantic information from the semantic model's tables or measures are preserved in the FabricDataFrame.
  • When you create a FabricDataFrame from in-memory data, you need to supply the name of a semantic model from which the FabricDataFrame can pull metadata information.

How semantic data is preserved varies depending on factors such as the operations that you're performing and the order of the FabricDataFrames on which you're operating.

Semantic propagation with merge operation

When you merge two FabricDataFrames, the order of the DataFrames determines how semantic information is propagated.

  • If both FabricDataFrames are annotated, then the table-level metadata of the left FabricDataFrame takes precedence. The same rule applies to individual columns; the column annotations present in the left FabricDataFrame take precedence over the column annotations in the right one.
  • If only one FabricDataFrame is annotated, its metadata is used. The same rule applies to individual columns; the column annotations present in the annotated FabricDataFrame is used.

Semantic propagation with concatenation

When you perform concatenation on multiple FabricDataFrames, for each column, SemPy copies the metadata from the first FabricDataFrame that matches the column name. If there are multiple matches and the metadata is not the same, a warning will be issued.

You can also propagate concatenations of FabricDataFrames with regular pandas DataFrames by placing the FabricDataFrame first.

Semantic propagation for Spark users

The semantic link Spark native connector hydrates (or populates) the metadata dictionary of a Spark column. Currently, support for semantic propagation is limited and subject to Spark's internal implementation of how schema information is propagated. For example, column aggregation strips the metadata.