Tutorial: Use a notebook with Apache Spark to query a KQL database

Notebooks are both readable documents containing data analysis descriptions and results as well as executable documents that can be run to perform data analysis. In this article, you learn how to use a Microsoft Fabric notebook to read and write data to a KQL database using Apache Spark. This tutorial uses precreated datasets and notebooks in both the Real-time Analytics and the Data Engineering environments in Microsoft Fabric. For more information on notebooks, see How to use Microsoft Fabric notebooks.

Specifically, you learn how to:

  • Create a KQL database
  • Import a notebook
  • Write data to a KQL database using Apache Spark
  • Query data from a KQL database

Prerequisites

1- Create a KQL database

  1. Open the experience switcher on the bottom of the navigation pane and select Real-Time Analytics.

  2. Select the KQL Database tile.

    Screenshot of new KQL database tile in Real-Time Analytics.

  3. In the KQL Database name field, enter nycGreenTaxi, then select Create.

    The KQL database has now been created within the context of the selected workspace.

  4. Copy the Query URI from the database details card in the database dashboard and paste it somewhere, like a notepad, to use in a later step.

     Screenshot of the database details card that shows the database details. The Query URI option titled Copy URI is highlighted.

2- Download the NYC GreenTaxi notebook

We've created a sample notebook that takes you through all the necessary steps for loading data into your database using the Spark connector.

  1. Open the Fabric samples repository on GitHub to download the NYC GreenTaxi KQL notebook..

    Screenshot of GitHub repository showing the NYC GreenTaxi notebook. The Raw option is highlighted.

  2. Save the notebook locally to your device.

    Note

    The notebook must be saved in the .ipynb file format.

3- Import the notebook

The rest of this workflow occurs in the Data Engineering section of the product, and uses a Spark notebook to load and query data in your KQL database.

  1. Open the experience switcher on the bottom of the navigation pane and select Data Engineering.

  2. Select Import notebook.

    Screenshot of item options in Data Engineering. The item titled Import notebook is highlighted.

  3. In the Import status window, select Upload.

    Screenshot of Import status window. The button titled Upload is highlighted.

  4. Select the NYC GreenTaxi notebook you downloaded in a previous step.

  5. Once the import is complete, return to your workspace to open this notebook.

4- Get data

To query your database using the Spark connector, you need to give read and write access to the NYC GreenTaxi blob container.

Select the play button to run the following cells, or select the cell and press Shift+ Enter. Repeat this step for each code cell.

Note

Wait for the completion check mark to appear before running the next cell.

  1. Run the following cell to enable access to the NYC GreenTaxi blob container.

    Screenshot of first code cell showing storage access information.

  2. In KustoURI, paste the Query URI that you copied earlier instead of the placeholder text.

  3. Change the placeholder database name to nycGreenTaxi.

  4. Change the placeholder table name to GreenTaxiData.

    Screenshot of second code cell showing the target database information. The Query URI, the database name, and the table name are highlighted.

  5. Run the cell.

  6. Run the next cell to write data to your database. It may take a few minutes for this step to complete.

    Screenshot of third code cell showing table mapping and ingestion command.

Your database now has data loaded in a table named GreenTaxiData.

5- Run the notebook

Run the remaining two cells sequentially to query data from your table. The results show the top 20 highest and lowest taxi fares and distances recorded by year.

Screenshot of fourth and fifth code cell showing the query results.

6- Clean up resources

Clean up the items created by navigating to the workspace in which they were created.

  1. In your workspace, hover over the notebook you want to delete, select the More menu [...] > Delete.

    Screenshot of workspace showing the drop-down menu of the NYC GreenTaxi notebook. The option titled Delete is highlighted.

  2. Select Delete. You can't recover your notebook once you delete it.