Notikumi
Power BI DataViz pasaules čempionāts
14. febr. 16 - 31. marts 16
Ar 4 iespējām piedalīties, jūs varētu laimēt konferences paketi un iekļūt LIVE Grand Finale Lasvegasā
Uzzināt vairākŠī pārlūkprogramma vairs netiek atbalstīta.
Jauniniet uz Microsoft Edge, lai izmantotu jaunāko līdzekļu, drošības atjauninājumu un tehniskā atbalsta sniegtās priekšrocības.
This article targets you as a data modeler who works with Power BI Desktop. It describes three different many-to-many modeling scenarios. It also provides you with guidance on how to successfully design for them in your models.
Piezīme
An introduction to model relationships is not covered in this article. If you're not completely familiar with relationships, their properties or how to configure them, we recommend that you first read the Model relationships in Power BI Desktop article.
It's also important that you have an understanding of star schema design. For more information, see Understand star schema and the importance for Power BI.
There are three different many-to-many scenarios. They can occur when you're required to:
The classic many-to-many scenario relates two entities, for example bank customers and bank accounts. Consider that customers can have multiple accounts, and accounts can have multiple customers. When an account has multiple customers, they're commonly called joint account holders.
Modeling these entities is straight forward. One dimension table stores accounts, and another dimension table stores customers. As is characteristic of dimension tables, there's a unique identifier (ID) column in each table. To model the relationship between the two tables, a third table is required. This table is commonly referred to as a bridging table. In this example, it's purpose is to store one row for each customer-account association. Interestingly, when this table only contains identifier columns, it's called a factless fact table.
Here's a simplistic diagram of the three model tables.
The first table is named Account
, and it contains two columns: AccountID
and Account
. The second table is named AccountCustomer
, and it contains two columns: AccountID
and CustomerID
. The third table is named Customer
, and it contains two columns: CustomerID
and Customer
. Relationships don't exist between any of the tables.
Two one-to-many relationships are added to relate the tables. Here's an updated model diagram of the related tables. A fact table named Transaction
has been added. It records account transactions. The bridging table and all identifier columns have been hidden.
To help describe how the relationship filter propagation works, the model diagram has been modified to reveal the table rows.
The row details for the four tables are presented in the following bulleted list:
Account
table has two rows:
AccountID
1 is for Account-01AccountID
2 is for Account-02Customer
table has two rows:
CustomerID
91 is for Customer-91CustomerID
92 is for Customer-92AccountCustomer
table has three rows:
AccountID
1 is associated with CustomerID
91AccountID
1 is associated with CustomerID
92AccountID
2 is associated with CustomerID
92Transaction
table has three rows:
Date
January 1 2019, AccountID
1, Amount
100Date
February 2 2019, AccountID
2, Amount
200Date
March 3 2019, AccountID
1, Amount
-25Let's see what happens when the model is queried.
In the following image, there are two table visuals that summarize the Amount
column of the Transaction
table. The first visual groups by account, and so the sum of the Amount
columns represents the account balance. The second visual groups by customer, and so the sum of the Amount
columns represents the customer balance.
The first table visual (Account Balance) has two columns: Account
and Amount
. It displays the following result:
The second table visual (Customer Balance) has two columns: Customer
and Amount
. It displays the following result:
A quick glance at the table rows and the Account Balance visual reveals that the result is correct, for each account and the total amount. That's because each account grouping results in a filter propagation to the Transaction
table for that account.
However, something doesn't appear correct with the Customer Balance visual. Each customer in this visual has the same balance as the total balance. This result could only be correct if every customer was a joint account holder of every account. That's not the case in this example. There's an issue, and it's related to filter propagation. Filters aren't flowing all the way to the Transaction
table.
If you follow the relationship filter directions from the Customer
table to the Transaction
table, you can determine that the relationship between the Account
and AccountCustomer
tables is propagating in the wrong direction. The filter direction for this relationship must be set to Both
.
As expected, there has been no change to the Account Balance visual.
The Customer Balance visual, however, now displays the following result:
The Customer Balance visual now displays a correct result. Follow the filter directions for yourself, and see how the customer balances were calculated. Also, understand that the visual total means all customers.
Someone unfamiliar with the model relationships could conclude that the result is incorrect. They might ask: Why isn't the total balance for Customer-91
and Customer-92
equal to 350 (75 + 275)?
The answer to their question lies in understanding the many-to-many relationship. Each customer balance can represent the addition of multiple account balances, and so the customer balances are non-additive.
When you have a many-to-many relationship between dimension tables, follow this guidance:
Is Nullable
property—data refresh will fail when missing values are sourced.We don't recommend that you relate many-to-many dimension tables directly. This design approach requires setting up a relationship with a many-to-many cardinality. Conceptually it can be achieved, yet it implies that the related columns might contain duplicate values. It's a well-accepted design practice, however, that dimension tables have an ID column. Dimension tables should always use the ID column as the "one" side of a relationship.
A different many-to-many scenario type involves relating two fact tables. Two fact tables can be related directly. This design technique can be useful for quick and simple data exploration. However, and to be clear, we generally don't recommend this design approach. We'll explain why later in this section.
Let's consider an example that involves two fact tables: Order
and Fulfillment
. The Order
table contains one row per order line, and the Fulfillment
table can contains zero or more rows per order line. Rows in the Order
table represent sales orders. Rows in the Fulfillment
table represent order items that have been shipped. A many-to-many relationship relates the OrderID
columns in each table, with filter propagation only from the Order
table (meaning the Order
table filters the Fulfillment
table).
The relationship cardinality is set to Many-to-many
to support storing duplicate OrderID
column values in both tables. In the Order
table, duplicate ID values can exist because an order can have multiple lines. In the Fulfillment
table, duplicate ID values can exist because orders can have multiple lines, and order lines can be fulfilled by many shipments.
Let's now take a look at the table rows. In the Fulfillment
table, notice that order lines can be fulfilled by multiple shipments. (The absence of an order line means the order is yet to be fulfilled.)
The row details for the two tables are described in the following bulleted list:
Order
table has five rows:
OrderDate
January 1 2019, OrderID
1, OrderLine
1, ProductID
Prod-A, OrderQuantity
5, Sales
50OrderDate
January 1 2019, OrderID
1, OrderLine
2, ProductID
Prod-B, OrderQuantity
10, Sales
80OrderDate
February 2 2019, OrderID
2, OrderLine
1, ProductID
Prod-B, OrderQuantity
5, Sales
40OrderDate
February 2 2019, OrderID
2, OrderLine
2, ProductID
Prod-C, OrderQuantity
1, Sales
20OrderDate
March 3 2019, OrderID
3, OrderLine
1, ProductID
Prod-C, OrderQuantity
5, Sales
100Fulfillment
table has four rows:
FulfillmentDate
January 1 2019, FulfillmentID
50, OrderID
1, OrderLine
1, FulfillmentQuantity
2FulfillmentDate
February 2 2019, FulfillmentID
51, OrderID
2, OrderLine
1, FulfillmentQuantity
5FulfillmentDate
February 2 2019, FulfillmentID
52, OrderID
1, OrderLine
1, FulfillmentQuantity
3FulfillmentDate
January 1 2019, FulfillmentID
53, OrderID
1, OrderLine
2, FulfillmentQuantity
10Let's see what happens when the model is queried. Here's a table visual comparing order and fulfillment quantities by the Order
table OrderID
column.
The visual presents an accurate result. However, the usefulness of the model is limited because you can only filter or group by the Order
table OrderID
column.
Generally, we don't recommend that you relate two fact tables directly by using many-to-many cardinality. The main reason is because the model won't provide flexibility in the ways your report visuals filter or group. In the example, it's only possible for visuals to filter or group by the Order
table OrderID
column. Another reason relates to the quality of your data. If your data has integrity issues, it's possible some rows might be omitted during querying due to the nature many-to-man cardinality and limited relationships.
Instead of relating fact tables directly, we recommend that you implement a star schema design. That means you add dimension tables. These dimension tables then relate to the fact tables by using one-to-many relationships. This design approach is robust as it efficiently delivers flexible reporting options. It lets you filter or group by using any of the dimension table columns, and summarize columns of any related fact table.
Let's consider a better solution.
Notice the following design changes:
OrderLine
, OrderDate
, Product
, and FulfillmentDate
.OrderLine
table contains the OrderLineID
column, which stores the OrderID
value multiplied by 100, plus the OrderLine
column value—an ID for each order line.Order
and Fulfillment
tables now each contain an OrderLineID
column, and they no longer contain the OrderID
and OrderLine
columns.Fulfillment
table now contains OrderDate
and ProductID
columns.FulfillmentDate
table has a relationship only to the Fulfillment
table.Taking the time to adopt a star schema design delivers the following benefits:
OrderLine
, OrderDate
, or Product
tables propagate to both fact tables.This many-to-many scenario is very different from the other two already described in this article.
Let's consider an example involving four tables: Date
, Sales
, Product
, and Target
. The Date
and Product
tables are dimension tables, and one-to-many relationships relate each to the Sales
fact table. So far, it represents a good star schema design. The Target
table, however, is yet to be related to the other tables.
The Target
table contains three columns: Category
, TargetQuantity
, and TargetYear
. The table rows reveal a granularity of year and product category. In other words, targets—used to measure sales performance—are set each year for each product category.
Because the Target
table stores data at a higher level than the dimension tables, a one-to-many relationship can't be created. Well, it's true for just one of the relationships. Let's explore how the Target
table can be related to the dimension tables.
A relationship between the Date
and Target
tables should be a one-to-many relationship. That's because the TargetYear
column values are dates. In this example, each TargetYear
column stores the first date of the target year.
Padoms
When storing facts at a higher time granularity than day, set the column data type to Date (or Whole number if you're using date keys). In the column, store a value representing the first day of the time period. For example, a year period is recorded as January 1 of the year, and a month period is recorded as the first day of that month.
Care must be taken, however, to ensure that month or date level filters produce a meaningful result. Without any special calculation logic, report visuals might report that target dates are literally the first day of each year. All other days—and all months except January—will summarize the target quantity as BLANK.
The following matrix visual shows what happens when the report user drills down from a year into its months. The visual summarizes the TargetQuantity
column. (The Show items with no data option has been enabled for the matrix rows.)
To avoid this behavior, we recommend that you control the summarization of your fact data by using measures. One way to control the summarization is to return BLANK when lower-level time periods are queried. Another way—defined with some sophisticated DAX—is to apportion values across lower-level time periods.
Consider the following measure definition that uses the ISFILTERED DAX function. It only returns a value when the Date
and Month
columns aren't filtered.
Target Quantity =
IF(
NOT ISFILTERED('Date'[Date])
&& NOT ISFILTERED('Date'[Month]),
SUM(Target[TargetQuantity])
)
The following matrix visual uses the Target Quantity
measure. It shows that all monthly target quantities are BLANK.
A different design approach is required when relating a non-date column from a dimension table to a fact table (and it's at a higher grain than the dimension table).
The Category
columns (from both the Product
and Target
tables) contains duplicate values. So, there's no "one" side for a one-to-many relationship. In this case, you'll need to create a many-to-many relationship. The relationship should propagate filters in a single direction, from the dimension table to the fact table.
Let's now take a look at the table rows.
In the Target
table, there are four rows: two rows for each target year (2019 and 2020), and two categories (Clothing and Accessories). In the Product
table, there are three products. Two belong to the clothing category, and one belongs to the accessories category. One of the clothing colors is green, and the remaining two are blue.
A table visual grouping by the Category
column from the Product
table produces the following result. However, this visual produces the correct result. Let's now consider what happens when the Color
column from the Product
table is used to group target quantity.
The visual produces a misrepresentation of the data. What is happening here?
A filter on the Color
column from the Product
table results in two rows. One of the rows is for the Clothing category, and the other is for the Accessories category. These two category values are propagated as filters to the Target
table. In other words, because the color blue is used by products from two categories, those categories are used to filter the targets.
To avoid this behavior, as described earlier, we recommend that you control the summarization of your fact data by using measures.
Consider the following measure definition. Notice that all Product
table columns that are beneath the category level are tested for filters.
Target Quantity =
IF(
NOT ISFILTERED('Product'[ProductID])
&& NOT ISFILTERED('Product'[Product])
&& NOT ISFILTERED('Product'[Color]),
SUM(Target[TargetQuantity])
)
The following table visual uses the Target Quantity
measure. It shows that all color target quantities are BLANK.
The final model design looks like the following.
When you need to relate a dimension table to a fact table, and the fact table stores rows at a higher grain than the dimension table rows, follow this guidance:
For more information related to this article, check out the following resources:
Notikumi
Power BI DataViz pasaules čempionāts
14. febr. 16 - 31. marts 16
Ar 4 iespējām piedalīties, jūs varētu laimēt konferences paketi un iekļūt LIVE Grand Finale Lasvegasā
Uzzināt vairākApmācība
Modulis
Design a semantic model in Power BI - Training
The process of creating a complicated semantic model in Power BI is straightforward. If your data is coming in from more than one transactional system, before you know it, you can have dozens of tables that you have to work with. Building a great semantic model is about simplifying the disarray. A star schema is one way to simplify a semantic model, and you learn about the terminology and implementation of them in this module. You will also learn about why choosing the correct data granularity is important
Sertifikācija
Microsoft Certified: Power BI Data Analyst Associate - Certifications
Demonstrate methods and best practices that align with business and technical requirements for modeling, visualizing, and analyzing data with Microsoft Power BI.