Tutorial: Import Jupyter notebooks from GitHub into Azure Cosmos DB for NoSQL (preview)

APPLIES TO: NoSQL

Warning

The Jupyter Notebooks feature of Azure Cosmos DB will be retired March 30, 2024; you will not be able to use built-in Jupyter notebooks from the Azure Cosmos DB account. We recommend using Visual Studio Code's support for Jupyter notebooks or your preferred notebooks client.

This tutorial walks through how to import Jupyter notebooks from a GitHub repository and run them in an Azure Cosmos DB for NoSQL account. After importing the notebooks, you can run, edit them, and persist your changes back to the same GitHub repository.

Prerequisites

Create a copy of a GitHub repository

  1. Navigate to the azure-samples/cosmos-db-nosql-notebooks template repository.

  2. Create a new copy of the template repository in your own GitHub account or organization.

Pull notebooks from GitHub

Instead of creating new notebooks each time you start a workspace, you can import existing notebooks from GitHub. In this section, you'll connect to an existing GitHub repository with sample notebooks.

  1. Navigate to your Azure Cosmos DB account and open the Data Explorer.

  2. Select Connect to GitHub.

    Screenshot of the Data Explorer with the 'Connect to GitHub' option highlighted.

  3. In the Connect to GitHub dialog, select the access option appropriate to your GitHub repository and then select Authorize access.

    Screenshot of the 'Connect to GitHub' dialog with options for various levels of access.

  4. Complete the GitHub third-party authorization workflow granting access to the organization[s] required to access your GitHub repository. For more information, see Authorizing GitHub Apps.

  5. In the Manage GitHub settings dialog, select the GitHub repository you created earlier.

    Screenshot of the 'Manage GitHub settings' dialog with a list of unpinned and pinned repositories.

  6. Back in the Data Explorer, locate the new tree of nodes for your pinned repository and open the website-metrics-python.ipynb file.

    Screenshot of the pinned repositories in the Data Explorer.

  7. In the editor for the notebook, locate the following cell.

    import pandas as pd
    pd.options.display.html.table_schema = True
    pd.options.display.max_rows = None
    
    df_cosmos.groupby("Item").size()
    
  8. The cell currently outputs the number of unique items. Replace the final line of the cell with a new line to output the number of unique actions in the dataset.

    df_cosmos.groupby("Action").size()
    
  9. Run all the cells sequentially to see the new dataset. The new dataset should only include three potential values for the Action column. Optionally, you can select a data visualization for the results.

    Screenshot of the Pandas dataframe visualization for the data.

Push notebook changes to GitHub

Tip

Currently, temporary workspaces will be de-allocated if left idle for 20 minutes. The maximum amount of usage time per day is 60 minutes. These limits are subject to change in the future.

To save your work permanently, save your notebooks back to the GitHub repository. In this section, you'll persist your changes from the temporary workspace to GitHub as a new commit.

  1. Select Save to create a commit for your change to the notebook.

    Screenshot of the 'Save' option in the Data Explorer menu.

  2. In the Save dialog, add a descriptive commit message.

    Screenshot of the 'Save' dialog with an example of a commit message.

  3. Navigate to the GitHub repository you created using your browser. The new commit should now be visible in the online repository.

    Screenshot of the updated notebook on the GitHub website.

Next steps