Tutorial: Import Jupyter notebooks from GitHub into Azure Cosmos DB for NoSQL (preview)
APPLIES TO:
NoSQL
Important
The Jupyter Notebooks feature of Azure Cosmos DB is currently in a preview state and is progressively rolling out to all customers over time.
This tutorial walks through how to import Jupyter notebooks from a GitHub repository and run them in an Azure Cosmos DB for NoSQL account. After importing the notebooks, you can run, edit them, and persist your changes back to the same GitHub repository.
Prerequisites
- An existing Azure Cosmos DB for NoSQL account.
- If you have an existing Azure subscription, create a new account.
- No Azure subscription? You can try Azure Cosmos DB free with no credit card required.
Create a copy of a GitHub repository
Navigate to the azure-samples/cosmos-db-nosql-notebooks template repository.
Create a new copy of the template repository in your own GitHub account or organization.
Pull notebooks from GitHub
Instead of creating new notebooks each time you start a workspace, you can import existing notebooks from GitHub. In this section, you'll connect to an existing GitHub repository with sample notebooks.
Navigate to your Azure Cosmos DB account and open the Data Explorer.
Select Connect to GitHub.
In the Connect to GitHub dialog, select the access option appropriate to your GitHub repository and then select Authorize access.
Complete the GitHub third-party authorization workflow granting access to the organization[s] required to access your GitHub repository. For more information, see Authorizing GitHub Apps.
In the Manage GitHub settings dialog, select the GitHub repository you created earlier.
Back in the Data Explorer, locate the new tree of nodes for your pinned repository and open the website-metrics-python.ipynb file.
In the editor for the notebook, locate the following cell.
import pandas as pd pd.options.display.html.table_schema = True pd.options.display.max_rows = None df_cosmos.groupby("Item").size()
The cell currently outputs the number of unique items. Replace the final line of the cell with a new line to output the number of unique actions in the dataset.
df_cosmos.groupby("Action").size()
Run all the cells sequentially to see the new dataset. The new dataset should only include three potential values for the Action column. Optionally, you can select a data visualization for the results.
Push notebook changes to GitHub
Tip
Currently, temporary workspaces will be de-allocated if left idle for 20 minutes. The maximum amount of usage time per day is 60 minutes. These limits are subject to change in the future.
To save your work permanently, save your notebooks back to the GitHub repository. In this section, you'll persist your changes from the temporary workspace to GitHub as a new commit.
Select Save to create a commit for your change to the notebook.
In the Save dialog, add a descriptive commit message.
Navigate to the GitHub repository you created using your browser. The new commit should now be visible in the online repository.
Next steps
Feedback
Submit and view feedback for