Tutorial: Create a Jupyter Notebook in Azure Cosmos DB for NoSQL to analyze and visualize data (preview)
APPLIES TO:
NoSQL
Warning
The Jupyter Notebooks feature of Azure Cosmos DB will be retired March 30, 2024; you will not be able to use built-in Jupyter notebooks from the Azure Cosmos DB account. We recommend using Visual Studio Code's support for Jupyter notebooks or your preferred notebooks client.
This tutorial walks through how to use the Jupyter Notebooks feature of Azure Cosmos DB to import sample retail data to an Azure Cosmos DB for NoSQL account. You'll see how to use the Azure Cosmos DB magic commands to run queries, analyze the data, and visualize the results.
Prerequisites
- An existing Azure Cosmos DB for NoSQL account.
- If you have an existing Azure subscription, create a new account.
- No Azure subscription? You can try Azure Cosmos DB free with no credit card required.
Create a new notebook
In this section, you'll create the Azure Cosmos database, container, and import the retail data to the container.
Navigate to your Azure Cosmos DB account and open the Data Explorer.
Select New Notebook.
In the confirmation dialog that appears, select Create.
Note
A temporary workspace will be created to enable you to work with Jupyter Notebooks. When the session expires, any notebooks in the workspace will be removed.
Select the kernel you wish to use for the notebook.
Tip
Now that the new notebook has been created, you can rename it to something like VisualizeRetailData.ipynb.
Create a database and container using the SDK
Start in the default code cell.
Import any packages you require for this tutorial.
import azure.cosmos from azure.cosmos.partition_key import PartitionKey
Create a database named RetailIngest using the built-in SDK.
database = cosmos_client.create_database_if_not_exists('RetailIngest')
Create a container named WebsiteMetrics with a partition key of
/CartID
.container = database.create_container_if_not_exists(id='WebsiteMetrics', partition_key=PartitionKey(path='/CartID'))
Select Run to create the database and container resource.
Import data using magic commands
Add a new code cell.
Within the code cell, add the following magic command to upload, to your existing container, the JSON data from this url: https://cosmosnotebooksdata.blob.core.windows.net/notebookdata/websiteData.json
%%upload --databaseName RetailIngest --containerName WebsiteMetrics --url https://cosmosnotebooksdata.blob.core.windows.net/notebookdata/websiteData.json
Select Run Active Cell to only run the command in this specific cell.
Note
The import command should take 5-10 seconds to complete.
Observe the output from the run command. Ensure that 2,654 documents were imported.
Documents successfully uploaded to WebsiteMetrics Total number of documents imported: Success: 2654 Failure: 0 Total time taken : 00:00:04 hours Total RUs consumed : 27309.660000001593
Visualize your data
Create another new code cell.
In the code cell, use a SQL query to populate a Pandas DataFrame.
%%sql --database RetailIngest --container WebsiteMetrics --output df_cosmos SELECT c.Action, c.Price as ItemRevenue, c.Country, c.Item FROM c
Select Run Active Cell to only run the command in this specific cell.
Create another new code cell.
In the code cell, output the top 10 items from the dataframe.
df_cosmos.head(10)
Select Run Active Cell to only run the command in this specific cell.
Observe the output of running the command.
Action ItemRevenue Country Item 0 Purchased 19.99 Macedonia Button-Up Shirt 1 Viewed 12.00 Papua New Guinea Necklace 2 Viewed 25.00 Slovakia (Slovak Republic) Cardigan Sweater 3 Purchased 14.00 Senegal Flip Flop Shoes 4 Viewed 50.00 Panama Denim Shorts 5 Viewed 14.00 Senegal Flip Flop Shoes 6 Added 14.00 Senegal Flip Flop Shoes 7 Added 50.00 Panama Denim Shorts 8 Purchased 33.00 Palestinian Territory Red Top 9 Viewed 30.00 Malta Green Sweater Create another new code cell.
In the code cell, import the pandas package to customize the output of the dataframe.
import pandas as pd pd.options.display.html.table_schema = True pd.options.display.max_rows = None df_cosmos.groupby("Item").size()
Select Run Active Cell to only run the command in this specific cell.
In the output, select the Line Chart option to view a different visualization of the data.
Persist your notebook
In the Notebooks section, open the context menu for the notebook you created for this tutorial and select Download.
Tip
To save your work permanently, save your notebooks to a GitHub repository or download the notebooks to your local machine before the session ends.
Next steps
Feedback
Submit and view feedback for