Understanding Relationships
This topic introduces you to the relationships that you can define between tables in PowerPivot for Excel. The topic includes the following sections:
What is a Relationship?
Requirements for Relationships
Automatic Detection and Inference of Relationships
After you have read this topic, you should understand what a relationship is, what the requirements are for defining a relationship, and how PowerPivot for Excel can automatically detect relationships for you. Along the way, you will learn some of the terminology that database professionals use to describe relationships.
What is a Relationship?
A relationship is a connection between two tables of data, based on one or more columns in each table (exactly one column in each table for PowerPivot). To see why relationships are useful, imagine that you track data for customer orders in your business. You could track all the data in a single table that has a structure like the following:
CustomerID |
Name |
DiscountRate |
OrderID |
OrderDate |
Product |
Quantity |
|
---|---|---|---|---|---|---|---|
1 |
Ashton |
chris.ashton@contoso.com |
.05 |
256 |
2010-01-07 |
Compact Digital |
11 |
1 |
Ashton |
chris.ashton@contoso.com |
.05 |
255 |
2010-01-03 |
SLR Camera |
15 |
2 |
Jaworski |
michal.jaworski@contoso.com |
.10 |
254 |
2010-01-03 |
Budget Movie-Maker |
27 |
This approach can work, but it involves storing a lot of redundant data, such as the customer e-mail address for every order. Storage is cheap, but you have to make sure you update every row for that customer if the e-mail address changes. One solution to this problem is to split the data into multiple tables and define relationships between those tables. This is the approach used in relational databases like SQL Server. For example, a database that you import into PowerPivot for Excel might represent order data by using three related tables:
Customers
[CustomerID] |
Name |
|
---|---|---|
1 |
Ashton |
chris.ashton@contoso.com |
2 |
Jaworski |
michal.jaworski@contoso.com |
CustomerDiscounts
[CustomerID] |
DiscountRate |
---|---|
1 |
.05 |
2 |
.10 |
Orders
[CustomerID] |
OrderID |
OrderDate |
Product |
Quantity |
---|---|---|---|---|
1 |
256 |
2010-01-07 |
Compact Digital |
11 |
1 |
255 |
2010-01-03 |
SLR Camera |
15 |
2 |
254 |
2010-01-03 |
Budget Movie-Maker |
27 |
If you import these tables from the same database, PowerPivot can detect the relationships between the tables based on the columns that are in [brackets], and can reproduce these relationships in the PowerPivot window. For more information, see Automatic Detection and Inference of Relationships in this topic. If you import tables from multiple sources, you can manually create relationships as described in Create a Relationship Between Two Tables.
Columns and Keys
Relationships are based on columns in each table that contain the same data. For example, the Customers and Orders tables can be related to each other because they both contain a column that stores a customer ID. In the example, the column names are the same, but this is not a requirement. One could be CustomerID and another CustomerNumber, as long as all of the rows in the Orders table contain an ID that is also stored in the Customers table.
In a relational database, there are several types of keys, which are typically just columns with special properties. The following four types of keys are the most interesting for our purposes:
Primary key: uniquely identifies a row in a table, such as CustomerID in the Customers table.
Alternate key (or candidate key): a column other than the primary key that is unique. For example, an Employees table might store an employee ID and a social security number, both of which are unique.
Foreign key: a column that refers to a unique column in another table, such as CustomerID in the Orders table, which refers to CustomerID in the Customers table.
Composite key: a key that is composed of more than one column. Composite keys are not supported in PowerPivot for Excel. For more information, see "Composite Keys and Lookup Columns" in this topic.
In PowerPivot for Excel, the primary key or alternate key is referred to as the related lookup column, or just lookup column. If a table has both a primary and alternate key, you can use either as the lookup column. The foreign key is referred to as the source column or just column. In our example, a relationship would be defined between CustomerID in the Orders table (the column) and CustomerID (the lookup column) in the Customers table. If you import data from a relational database, by default PowerPivot for Excel chooses the foreign key from one table and the corresponding primary key from the other table. However, you can use any column that has unique values for the lookup column.
Types of Relationships
The relationship between Customers and Orders is a one-to-many relationship. Every customer can have multiple orders, but an order cannot have multiple customers. The other types of relationships are one-to-one and many-to-many. The CustomerDiscounts table, which defines a single discount rate for each customer, is in a one-to-one relationship with the Customers table. An example of a many-to-many relationship is a direct relationship between Products and Customers, in which a customer can buy many products and the same product can be bought by many customers. PowerPivot for Excel does not support many-to-many relationships in the user interface. For more information, see "Many-to-Many Relationships" in this topic.
The following table shows the relationships between the three tables:
Relationship |
Type |
Lookup Column |
Column |
---|---|---|---|
Customers-CustomerDiscounts |
one-to-one |
Customers.CustomerID |
CustomerDiscounts.CustomerID |
Customers-Orders |
one-to-many |
Customers.CustomerID |
Orders.CustomerID |
Relationships and Performance
After any relationship has been created, PowerPivot for Excel typically must recalculate any formulas that use columns from tables in the newly created relationship. Processing can take some time, depending on the amount of data and the complexity of the relationships. For more information, see Recalculating Formulas.
Requirements for Relationships
PowerPivot for Excel has several requirements that must be followed when creating relationships:
Single Relationship between Tables
Multiple relationships could result in ambiguous dependencies between tables. To create accurate calculations, you need a single path from one table to the next. Therefore, there can be only one relationship between each pair of tables. For example, in AdventureWorks DW 2008, the table, DimDate, contains a column, DateKey, that is related to three different columns in the table FactInternetSales: OrderDate, DueDate, and ShipDate. If you attempt to import these tables, the first relationship is created successfully, but you will receive the following error on successive relationships that involve the same column:
* Relationship: table[column 1]-> table[column 2] - Status: error - Reason: A relationship cannot be created between tables <table 1> and <table 2>. Only one direct or indirect relationship can exist between two tables.
If you have two tables and multiple relationships between them, then you will need to import multiple copies of the table that contains the lookup column, and create one relationship between each pair of tables.
One Relationship for Each Source Column
A source column cannot participate in multiple relationships. If you have used a column as a source column in one relationship already, but want to use that column to connect to another related lookup column in a different table, you can create a copy of the column, and use that column for the new relationship.
It is easy to create a copy of a column that has the exact same values, by using a DAX formula in a calculated column. For more information, see Create a Calculated Column.
Unique Identifier for Each Table
Each table must have a single column that uniquely identifies each row in that table. This column is often referred to as the primary key.
Unique Lookup Columns
The data values in the lookup column must be unique. In other words, the column cannot contain duplicates. In PowerPivot for Excel, nulls and empty strings are equivalent to a blank, which is a distinct data value. This means that you cannot have multiple nulls in the lookup column.
Compatible Data Types
The data types in the source column and lookup column must be compatible. For more information about data types, see Data Types Supported in PowerPivot Workbooks.
Composite Keys and Lookup Columns
You cannot use composite keys in a PowerPivot workbook; you must always have exactly one column that uniquely identifies each row in the table. If you try to import tables that have an existing relationship based on a composite key, the Table Import Wizard will ignore that relationship because it cannot be created in PowerPivot.
If you want to create a relationship between two tables in PowerPivot, and there are multiple columns defining the primary and foreign keys, you must combine the values to create a single key column before creating the relationship. You can do this before you import the data, or you can do this in PowerPivot by creating a calculated column.
Many-to-Many Relationships
PowerPivot for Excel does not support many-to-many relationships, and you cannot simply add junction tables in PowerPivot. However, you can use DAX functions to model many-to-many relationships.
Self-Joins and Loops
Self-joins are not permitted in PowerPivot tables. A self-join is a recursive relationship between a table and itself. Self-joins are often used to define parent-child hierarchies. For example, you could join an Employees table to itself to produce a hierarchy that shows the management chain at a business.
PowerPivot for Excel does not allow loops to be created among relationships in a workbook. In other words, the following set of relationships is prohibited.
Table 1, column a to Table 2, column f
Table 2, column f to Table 3, column n
Table 3, column n to Table 1, column a
If you try to create a relationship that would result in a loop being created, an error is generated.
Automatic Detection and Inference of Relationships
When you import data into the PowerPivot window, the Table Import Wizard automatically detects any existing relationships among tables. Also, when you create a PivotTable, PowerPivot for Excel analyzes the data in the tables. It detects possible relationships that have not been defined, and suggests appropriate columns to include in those relationships.
The detection algorithm uses statistical data about the values and metadata of columns to make inferences about the probability of relationships.
Names of the columns should be similar to each other, but it is not necessary that they be exactly the same. For example, in a business setting you often have variations on the names of columns that contain essentially the same data: Emp ID, EmployeeID, Employee ID, EMP_ID, and so forth. The algorithm detects similar names, and will assign a higher probability to those columns that have similar or exactly matching names. Therefore, to improve the accuracy of detection, you can try renaming the columns in the data that you import to something similar to columns in your existing tables.
Data types in all related columns should be compatible. For automatic detection, only whole number and text data types are supported. For more information about data types, see Data Types Supported in PowerPivot Workbooks.
This information might help you understand why not all relationships are detected, or how changes in metadata--such as field name and the data types--could improve the results of automatic relationship detection. For more information, see Troubleshoot Relationships.
Automatic Detection for Named Sets
Relationships are not automatically detected between Named Sets and related fields in a PivotTable. You can create these relationships manually. If you want to use automatic relationship detection, remove each Named Set and add the individual fields from the Named Set directly to the PivotTable.
Inference of Relationships
In some cases, relationships between tables are automatically chained. For example, if you create a relationship between the first two sets of tables below, a relationship is inferred to exist between the other two tables, and a relationship is automatically established.
Products and Category -- created manually
Category and SubCategory -- created manually
Products and SubCategory -- relationship is inferred
In order for relationships to be automatically chained, the relationships must go in one direction, as shown above. If the initial relationships were between, for example, Sales and Products, and Sales and Customers, a relationship is not inferred. This is because the relationship between Products and Customers is a many-to-many relationship.