Implement dimensional tables

Completed

With queries, views, and stored procedures in place, you're ready to implement the tables that hold your transformed data. Dimensional modeling organizes data into fact tables (measures and metrics) and dimension tables (descriptive attributes), creating a structure that's optimized for analytical queries and semantic models.

Create dimension tables

Dimension tables hold the descriptive attributes that give context to your measures, such as customer names, product categories, and date hierarchies. Each dimension table has a surrogate key (an autogenerated integer) that fact tables reference instead of the natural business key.

Using surrogate keys provides several benefits. They're compact integers that join efficiently. They insulate the warehouse from source system key changes. They also support slowly changing dimension (SCD) patterns where the same business entity can have multiple historical records.

CREATE TABLE dim.customer (
    customer_key INT IDENTITY(1,1),
    customer_id NVARCHAR(20) NOT NULL,
    customer_name NVARCHAR(100),
    segment NVARCHAR(50),
    region NVARCHAR(50),
    effective_date DATE,
    end_date DATE,
    is_current BIT DEFAULT 1
);

The IDENTITY column generates a unique surrogate key for each row automatically. The effective_date, end_date, and is_current columns support SCD Type 2 tracking, where you preserve historical versions of a record instead of overwriting changes. When a customer's attributes change, you set end_date and is_current = 0 on the existing row, then insert a new row with the updated values.

Note

Organize your dimensional tables using schemas that reflect their role. Common conventions include dim for dimension tables, fact for fact tables, and staging for raw source data.

Create fact tables

Fact tables store the quantitative data: the numbers you aggregate in reports. Each row represents a business event (like a sale or a shipment) and contains foreign keys that reference dimension tables plus one or more numeric measures.

CREATE TABLE fact.sales (
    sales_key INT IDENTITY(1,1),
    date_key INT NOT NULL,
    customer_key INT NOT NULL,
    product_key INT NOT NULL,
    quantity INT,
    unit_price DECIMAL(10,2),
    sales_amount DECIMAL(12,2)
);

The date_key, customer_key, and product_key columns are foreign keys that point to the corresponding dimension tables. The quantity, unit_price, and sales_amount columns are the measures that get aggregated in queries and reports.

Important

Fabric warehouses don't enforce foreign key constraints at the engine level. Define your key relationships through naming conventions and loading logic to maintain referential integrity between fact and dimension tables. This pattern is common in cloud data warehouses where the query engine optimizes joins without needing constraint enforcement.

Load dimension tables

For the initial load, use a simple INSERT ... SELECT statement to populate a dimension from staging data:

INSERT INTO dim.customer
    (customer_id, customer_name, segment, region, effective_date, is_current)
SELECT
    customer_id,
    customer_name,
    segment,
    region,
    CAST(GETDATE() AS DATE),
    1
FROM staging.customers;

For ongoing updates, the approach depends on how you handle changes:

  • SCD Type 1 (overwrite): Use a MERGE statement to update existing rows in place. This approach is simpler but doesn't preserve history. It's appropriate when historical values aren't important for analysis, such as correcting a misspelled customer name.
  • SCD Type 2 (history): Mark the current row as expired (set is_current = 0 and end_date to today), then insert a new row with the updated values. This approach preserves a full change history. It's important for trend analysis where you need to know what a customer's segment was at the time of a past transaction.

Load fact tables

Fact table loading joins staging data with dimension tables to look up the correct surrogate keys. This step translates natural business keys (like customer IDs from the source system) into the surrogate keys used in the dimensional model.

INSERT INTO fact.sales
    (date_key, customer_key, product_key, quantity, unit_price, sales_amount)
SELECT
    d.date_key,
    c.customer_key,
    p.product_key,
    s.quantity,
    s.unit_price,
    s.quantity * s.unit_price
FROM staging.orders AS s
INNER JOIN dim.date AS d
    ON s.order_date = d.calendar_date
INNER JOIN dim.customer AS c
    ON s.customer_id = c.customer_id
    AND c.is_current = 1
INNER JOIN dim.product AS p
    ON s.product_id = p.product_id;

The join to dim.customer filters on is_current = 1 to ensure each fact row links to the current version of the customer record. If you need historical accuracy (linking to the customer record that was active at the time of the order), match on the effective date range instead.

Use table clones for development and testing

When you're developing transformation logic, you don't want to risk corrupting production tables. Fabric supports table clones, which create a zero-copy reference to an existing table's data at a point in time. Clones let you test changes in isolation without duplicating storage.

CREATE TABLE fact.sales_dev AS CLONE OF fact.sales;

The clone initially shares the same underlying data files as the source table. When you modify the clone (inserting, updating, or deleting rows), Fabric writes only the changed data, keeping storage efficient. After you validate your changes against the clone, you can apply the same logic to the production table with confidence.

Table clones are also useful for creating snapshots before running a risky transformation, giving you a rollback point if something goes wrong.

You now have the full set of tools for T-SQL data transformation in a Fabric warehouse: queries that shape raw data, views that encapsulate reusable logic, stored procedures that automate processing, and dimensional tables that serve as the optimized foundation for analytics and semantic models.