Tutorial: Use a notebook with Apache Spark to query a KQL database

Article
17/01/2025

Notebooks are both readable documents containing data analysis descriptions and results and executable documents that can be run to perform data analysis. In this article, you learn how to use a Microsoft Fabric notebook to read and write data to a KQL database using Apache Spark. This tutorial uses precreated datasets and notebooks in both the Real-Time Intelligence and the Data Engineering environments in Microsoft Fabric. For more information on notebooks, see How to use Microsoft Fabric notebooks.

Specifically, you learn how to:

Create a KQL database
Import a notebook
Write data to a KQL database using Apache Spark
Query data from a KQL database

Prerequisites

A workspace with a Microsoft Fabric-enabled capacity

1- Create a KQL database

Select your workspace from the left navigation bar.
Follow one of these steps to start creating an eventstream:
- Select New item and then Eventhouse. In the Eventhouse name field, enter nycGreenTaxi, then select Create. A KQL database is generated with the same name.
- In an existing eventhouse, select Databases. Under KQL databases select +, in the KQL Database name field, enter nycGreenTaxi, then select Create.
Copy the Query URI from the database details card in the database dashboard and paste it somewhere, like a notepad, to use in a later step.

2- Download the NYC GreenTaxi notebook

We've created a sample notebook that takes you through all the necessary steps for loading data into your database using the Spark connector.

Open the Fabric samples repository on GitHub to download the NYC GreenTaxi KQL notebook..
Save the notebook locally to your device.

Note

The notebook must be saved in the .ipynb file format.

3- Import the notebook

The rest of this workflow occurs in the Data Engineering section of the product, and uses a Spark notebook to load and query data in your KQL database.

From your workspace select Import > Notebook > From this computer > Upload then choose the NYC GreenTaxi notebook you downloaded in a previous step.
Once the import is complete, open the notebook from your workspace.

4- Get data

To query your database using the Spark connector, you need to give read and write access to the NYC GreenTaxi blob container.

Select the play button to run the following cells, or select the cell and press Shift+ Enter. Repeat this step for each code cell.

Note

Wait for the completion check mark to appear before running the next cell.

Run the following cell to enable access to the NYC GreenTaxi blob container.
In KustoURI, paste the Query URI that you copied earlier instead of the placeholder text.
Change the placeholder database name to nycGreenTaxi.
Change the placeholder table name to GreenTaxiData.
Run the cell.
Run the next cell to write data to your database. It can take a few minutes for this step to complete.

Your database now has data loaded in a table named GreenTaxiData.

5- Run the notebook

Run the remaining two cells sequentially to query data from your table. The results show the top 20 highest and lowest taxi fares and distances recorded by year.

6- Clean up resources

Clean up the items created by navigating to the workspace in which they were created.

In your workspace, hover over the notebook you want to delete, select the More menu [...] > Delete.
Select Delete. You can't recover your notebook once you delete it.

Additional resources

Documentation

Fabric-notebooks gebruiken met gegevens uit een KQL-database - Microsoft Fabric

Meer informatie over het opvragen van gegevens in een KQL-database vanuit Microsoft Fabric Notebooks met behulp van KQL (Kusto Query Language)
KQL-database configureren in een kopieeractiviteit - Microsoft Fabric

In dit artikel wordt uitgelegd hoe u gegevens kopieert met behulp van KQL Database.
Een KQL-database maken - Microsoft Fabric

Meer informatie over het maken van een KQL-database in realtime intelligence.
Een snelkoppeling naar een database maken - Microsoft Fabric

Meer informatie over het maken van een snelkoppeling naar gegevens in een andere KQL-database of in Azure Data Explorer in realtime intelligence.
Toegang tot een bestaande KQL-database - Microsoft Fabric

Leer hoe u toegang krijgt tot een bestaande KQL-database en desgewenst de query-URI en de opname-URI kopieert om query's uit te voeren of gegevens op te halen in Real-Time Intelligence.
Opgeslagen functies maken in Realtime Intelligence - Microsoft Fabric

Meer informatie over het gebruik van de opdracht .create-or-alter function om opgeslagen functies te maken in Realtime Intelligence.
Gegevens ophalen uit bestand - Microsoft Fabric

Meer informatie over het ophalen van gegevens uit een lokaal bestand in een KQL-database in Real-Time Intelligence.
Overzicht van Eventhouse - Microsoft Fabric

Meer informatie over eventhouse-gegevensopslag in realtime intelligence.

Training

Module

Apache Spark gebruiken in Microsoft Fabric - Training

Apache Spark is een kerntechnologie voor grootschalige gegevensanalyse. Microsoft Fabric biedt ondersteuning voor Spark-clusters, zodat u gegevens op schaal kunt analyseren en verwerken.

Certification

Microsoft Certified: Fabric Data Engineer Associate - Certifications

As a Fabric Data Engineer, you should have subject matter expertise with data loading patterns, data architectures, and orchestration processes.

Events

FabCon Vegas

31 Mar, 11 pm - 2 Apr, 11 pm

The biggest Fabric, Power BI, and SQL learning event. March 31 – April 2. Use code FABINSIDER to save $400.

Share via

Tutorial: Use a notebook with Apache Spark to query a KQL database

Prerequisites

1- Create a KQL database

2- Download the NYC GreenTaxi notebook

3- Import the notebook

4- Get data

5- Run the notebook

6- Clean up resources

Feedback

Additional resources

Share via

Tutorial: Use a notebook with Apache Spark to query a KQL database

Prerequisites

1- Create a KQL database

2- Download the NYC GreenTaxi notebook

3- Import the notebook

4- Get data

5- Run the notebook

6- Clean up resources

Related content

Feedback

Additional resources