Редагувати

Поділитися через


Tutorial: Analyze data with a notebook

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

In this tutorial, learn about how you can save your data once and then use it with many other services. Shortcuts can also be created to data stored in Azure Data Lake Storage and S3 to enable you to directly access delta tables from external systems.

Create a lakehouse

First, we create a new lakehouse. To create a new lakehouse in your Microsoft Fabric workspace:

  1. Select the Data Warehouse Tutorial workspace in the navigation menu.

  2. Select + New > Lakehouse.

    Screenshot from the Fabric portal showing the + New menu. Lakehouse is boxed in red.

  3. In the Name field, enter ShortcutExercise, and select Create.

    Screenshot from the Fabric portal showing name field for the new lakehouse. The name provided is ShortcutExercise.

  4. The new lakehouse loads and the Explorer view opens up, with the Get data in your lakehouse menu. Under Load data in your lakehouse, select the New shortcut button.

    Screenshot from the Fabric portal showing the Load data in your lakehouse menu on the landing page. The New shortcut button is boxed in red.

  5. In the New shortcut window, select the button for Microsoft OneLake.

    Screenshot from the Fabric portal showing the New shortcut window. The button for Microsoft OneLake is boxed in red.

  6. In the Select a data source type window, scroll through the list until you find the Warehouse named WideWorldImporters you created previously. Select it, then select Next.

  7. In the OneLake object browser, expand Tables, expand the dbo schema, and then select the radio button beside dimension_customer. Select the Create button.

    Screenshot from the Fabric portal showing the OneLake object browser. Under WideWorldImporters, Tables, dbo, the dimension_customer is boxed in red.

  8. If you see a folder called Unidentified under Tables, select the Refresh icon in the horizontal menu bar.

    Screenshot from the Fabric portal showing the refresh button on the horizontal menu bar, and the Unidentified tables under ShortcutExercise in the Lakehouse explorer.

  9. Select the dimension_customer in the Table list to preview the data. The lakehouse is showing the data from the dimension_customer table from the Warehouse!

    Screenshot from the Fabric portal showing the data preview of the dimension_customer table.

  10. Next, create a new notebook to query the dimension_customer table. In the Home ribbon, select the dropdown for Open notebook and choose New notebook.

    Screenshot from the Fabric portal showing the Open notebook button pressed, and the New notebook option selected.

  11. Select, then drag the dimension_customer from the Tables list into the open notebook cell. You can see a PySpark query has been written for you to query all the data from ShortcutExercise.dimension_customer. This notebook experience is similar to Visual Studio Code Jupyter notebook experience. You can also open the notebook in VS Code.

    Screenshot from the Fabric portal notebook view. An arrow indicates the path to select dimension_customer, then drag and drop it into the open notebook cell.

  12. In the Home ribbon, select the Run all button. Once the query is completed, you will see you can easily use PySpark to query the Warehouse tables!

    Screenshot from the Fabric portal showing the results of running the notebook to display data from dimension_customer.

Next step