Power BI connectivity with semantic link and Microsoft Fabric

Power BI connectivity is at the core of semantic link. In this article, you'll learn about the ways that semantic link provides connectivity to semantic models for users of the Python pandas ecosystem and the Apache Spark ecosystem.

A semantic model usually represents the gold standard of data and is the result of upstream data processing and refinement. Business analysts can create Power BI reports from semantic models and use these reports to drive business decisions. Furthermore, they can encode their domain knowledge and business logic into Power BI measures. On the other hand, data scientists can work with the same semantic models, but typically in a different code environment or language. In such cases, it may become necessary for the data scientists to duplicate the business logic, which can lead to critical errors.

Semantic link bridges this gap between the semantic models and the Synapse Data Science in Microsoft Fabric experience. Thereby, providing a way for business analysts and data scientists to collaborate seamlessly and reduce data mismatch. Semantic link offers connectivity to the:

  • Python pandas ecosystem via the SemPy Python library, and
  • Semantic models through the Spark native connector that supports PySpark, Spark SQL, R, and Scala.

Data connectivity through SemPy Python library for pandas users

The SemPy Python library is part of the semantic link feature and serves pandas users. SemPy provides functionalities that include data retrieval from tables, computation of measures, and execution of DAX queries and metadata.

For Spark 3.4 and above, Semantic link is available in the default runtime when using Fabric, and there is no need to install it. If you are using Spark 3.3 or below, or if you want to update to the most recent version of Semantic Link, you can run the command:

python %pip install -U semantic-link  

SemPy also extends pandas DataFrames with additional metadata propagated from the Power BI data source. This metadata includes:

  • Power BI data categories:
    • Geographic: address, place, city, etc.
    • URL: web url, image url
    • Barcode
  • Relationships between tables
  • Hierarchies

Support for Spark (PySpark, Spark SQL, R and Scala)

The semantic link Spark native connector enables Spark users to access Power BI tables and measures. The connector is language-agnostic and supports PySpark, Spark SQL, R, and Scala.

To use the Spark native connector, semantic models are represented as Spark namespaces and transparently expose Power BI tables as Spark tables.

Configure Spark to use the Power BI Spark native connector:

spark.conf.set("spark.sql.catalog.pbi", "com.microsoft.azure.synapse.ml.powerbi.PowerBICatalog")

# Optionally, configure the workspace using its ID
# Resolve workspace name to ID using fabric.resolve_workspace_id("My workspace")
# Replace 00000000-0000-0000-0000-000000000000 with your own workspace ID
# spark.conf.set("spark.sql.catalog.pbi.workspace, "00000000-0000-0000-0000-000000000000")

List all tables in the semantic model Sales Dataset:

%%sql
SHOW TABLES FROM pbi.`Sales Dataset`

Display data from the table Customer in the semantic model Sales Dataset:

%%sql
SELECT * FROM pbi.`Sales Dataset`.Customer

Power BI measures are accessible through the virtual _Metrics table to bridge relational Spark SQL with multidimensional Power BI. In the following example, Total Revenue and Revenue Budget are measures defined in the Sales Dataset semantic model, while the remaining columns are dimensions. The aggregation function (for example, AVG) is ignored for measures and only serves for consistency with SQL.

The connector supports predicate push down of computation from Spark expressions into the Power BI engine; for example, Customer[State] in ('CA', 'WA'), thereby enabling utilization of Power BI optimized engine.

SELECT
    `Customer[Country/Region]`,
    `Industry[Industry]`,
    AVG(`Total Revenue`),
    AVG(`Revenue Budget`)
FROM
    pbi.`Sales Dataset`.`_Metrics`
WHERE
    `Customer[State]` in ('CA', 'WA')
GROUP BY
    `Customer[Country/Region]`,
    `Industry[Industry]`

Data augmentation with Power BI measures

The add_measure operation is a powerful feature of semantic link that enables you to augment data with measures from semantic models. The add_measure operation is only available in the SemPy Python library and not supported by the Spark native connector. For more information on the add_measure method, see add_measure in the FabricDataFrame Class

To use the SemPy Python library, you first need to install it in your notebook kernel by executing this code in a notebook cell:

Tip

The code example assumes that you've manually created a FabricDataFrame with data that you want to augment with measures from a semantic model.

# %pip and import only needs to be done once per notebook
%pip install semantic-link
from sempy.fabric import FabricDataFrame

df = FabricDataFrame({
        "Sales Agent": ["Agent 1", "Agent 1", "Agent 2"],
        "Customer[Country/Region]": ["US", "GB", "US"],
        "Industry[Industry]": ["Services", "CPG", "Manufacturing"],
    }
)

joined_df = df.add_measure(["Total Revenue", "Total Budget"], dataset="Sales Dataset")

The add_measure operation performs these steps:

  • Resolves column names: The column names in the FabricDataFrame are resolved to Power BI dimensions. Any column names that can't be resolved within the given semantic model are ignored (see the supported DAX syntax).
  • Defines group by columns, by using the resolved column names.
  • Computes one or more measures at the group by level.
  • Filters the result by the existing rows in the FabricDataFrame.