Edit

Share via


Lakehouse tutorial: Create a lakehouse, ingest sample data, and build a report

In this tutorial, you build a lakehouse, ingest sample data into the Delta table, apply transformation where required, and then create reports. Here's a checklist of the steps you complete:

If you don't have Microsoft Fabric, sign up for a free trial capacity.

Prerequisites

Why do I need OneDrive for this tutorial?

You need OneDrive for this tutorial because the data ingestion process relies on OneDrive as the underlying storage mechanism for file uploads. When you upload a CSV file to Fabric, it's temporarily stored in your OneDrive account before being ingested into the lakehouse. This integration ensures secure and seamless file transfer within the Microsoft 365 ecosystem.

The ingestion step doesn't work if you don't have OneDrive configured, because Fabric can't access the uploaded file. If you already have the data available in your lakehouse or another supported location, OneDrive isn't required.

Note

If you already have data in your lakehouse, you can use that data instead of the sample CSV file. To check if data is already associated with your lakehouse, use the Lakehouse Explorer or the SQL analytics endpoint to browse tables, files, and folders. For more information about how to check, see Lakehouse overview and Query lakehouse tables with SQL analytics endpoint.

Create a lakehouse

In this section, you create a lakehouse in Fabric.

  1. In Fabric, select Workspaces from the navigation bar.

  2. To open your workspace, enter its name in the search box located at the top and select it from the search results.

  3. From the workspace, select New item, enter Lakehouse in the search box, then select Lakehouse.

  4. In the New lakehouse dialog box, enter wwilakehouse in the Name field.

    Screenshot of the New lakehouse dialog box.

  5. Select Create to create and open the new lakehouse.

Ingest sample data

In this section, you ingest sample customer data into the lakehouse.

Note

If you don't have OneDrive configured, sign up for the Microsoft 365 free trial: Free Trial - Try Microsoft 365 for a month.

  1. Download the dimension_customer.csv file from the Fabric samples repo.

  2. In the Home tab, under Get data in your lakehouse, you see options to load data into the lakehouse. Select New Dataflow Gen2.

    Screenshot showing where to select New Dataflow Gen2 option to load data into your lakehouse.

  3. In the Create a dataflow pane, enter Customer Dimension Data in the Name field and select Next.

    Screenshot of the Create a dataflow pane, showing where to enter the dataflow name.

  4. On the new dataflow screen, select Import from a Text/CSV file.

  5. On the Connect to data source screen, select the Upload file radio button. Drag and drop the dimension_customer.csv file that you downloaded in step 1. After the file is uploaded, select Next.

    Screenshot showing where to select Upload file and where to drag the previously downloaded file.

  6. From the Preview file data page, preview the data and select Create to proceed and return back to the dataflow canvas.

Transform and load data into the lakehouse

In this section, you transform the data based on your business requirements and load it into the lakehouse.

  1. In the Query settings pane, update the Name field to dimension_customer.

    Note

    Fabric adds a space and number at the end of the table name by default. Table names must be lowercase and must not contain spaces. Rename it appropriately and remove any spaces from the table name.

    Screenshot of the query settings pane, showing where to enter the name and select the data destination.

  2. In this tutorial, you associated the customer data with a lakehouse. If you create a dataflow from the lakehouse, the uploaded data is automatically linked to the default lakehouse. If you're creating the dataflow separately, you can optionally associate it with a lakehouse by following these steps:

    1. From the menu items, select Add data destination and select Lakehouse. From the Connect to data destination screen, sign in to your account if necessary and select Next.

    2. Navigate to the wwilakehouse in your workspace.

    3. If the dimension_customer table doesn't exist, select the New table setting and enter the table name dimension_customer. If the table already exists, select the Existing table setting and choose dimension_customer from the list of tables in the object explorer. Select Next.

      Screenshot showing how to choose the destination table.

    4. On the Choose destination settings pane, select Replace as Update method. Select Save settings to return to the dataflow canvas.

  3. From the dataflow canvas, you can easily transform the data based on your business requirements. For simplicity, we aren't making any changes in this tutorial. To proceed, select Save and Run in the tool bar.

    Screenshot of the Query setting pane that contains the Publish button.

  4. Return to your workspace and hover over the Customer Dimension Data dataflow, select the ... menu, and then select Refresh now. This option runs the data flow and moves data from the source file to lakehouse table. While it's in progress, you see a spinning circle next to the dataflow's name.

    Screenshot showing where to find the Refresh now icon.

  5. Once the dataflow is refreshed, select your lakehouse in the top menu bar to view the dimension_customer Delta table.

    Screenshot of navigation panel from which the lakehouse is opened.

  6. Select the table to preview its data. You can also use the SQL analytics endpoint of the lakehouse to query the data with SQL statements. Select SQL analytics endpoint from the Lakehouse dropdown menu at the top right of the screen.

    Screenshot of the Delta table, showing where to select SQL analytics endpoint.

  7. Select the dimension_customer table to preview its data or select New SQL query to write your SQL statements.

    Screenshot of the SQL analytics endpoint screen, showing where to select New SQL query.

  8. The following sample query aggregates the row count based on the BuyingGroup column of the dimension_customer table. SQL query files are saved automatically for future reference, and you can rename or delete these files based on your need.

    To run the script, select the Run icon at the top of the script file.

    SELECT BuyingGroup, Count(*) AS Total
    FROM dimension_customer
    GROUP BY BuyingGroup
    

Add tables to the semantic model

In this section, you add the tables to the semantic model so that you can use them to create reports.

  1. Open your lakehouse and switch to the SQL analytics endpoint view, select New semantic model, name the semantic model, assign a workspace, and select the tables that you want to add to the semantic model. In this case, select the dimension_customer table.

    Screenshot where you can select the tables to add to the semantic model.

Build a report

In this section, you build a report from the ingested data.

  1. Select the semantic model in your workspace, select the dropdown Explore this data, and then select Auto-create a report. In the next tutorial, we create a report from scratch.

    Screenshot of the semantic model details page, showing where to select Create a report.

  2. The table is a dimension and there are no measures in it. Power BI creates a measure for the row count, aggregates it across different columns, and creates different charts as shown in the following image.

    Screenshot of a Quick summary page displaying four different bar charts.

  3. You can save this report for the future by selecting Save from the top ribbon. You can make more changes to this report to meet your requirements by including or excluding other tables or columns.

Next step