Quickstart: Create a graph database in Azure Cosmos DB using Python and the Azure portal
APPLIES TO: Gremlin
In this quickstart, you create and manage an Azure Cosmos DB for Gremlin (graph) API account from the Azure portal, and add data by using a Python app cloned from GitHub. Azure Cosmos DB is a multi-model database service that lets you quickly create and query document, table, key-value, and graph databases with global distribution and horizontal scale capabilities.
An Azure account with an active subscription. Create one for free. Or try Azure Cosmos DB for free without an Azure subscription.
Python 3.7+ including pip package installer.
You can also install the Python driver for Gremlin by using the
pip install gremlinpython==3.4.13
This quickstart requires a graph database account created after December 20, 2017. Existing accounts will support Python once they’re migrated to general availability.
We currently recommend using gremlinpython==3.4.13 with Gremlin (Graph) API as we haven't fully tested all language-specific libraries of version 3.5.* for use with the service.
Create a database account
Before you can create a graph database, you need to create a Gremlin (Graph) database account with Azure Cosmos DB.
In a new browser window, sign in to the Azure portal.
In the left menu, select Create a resource.
On the New page, select Databases > Azure Cosmos DB.
On the Create Azure Cosmos DB Account page, enter the settings for the new Azure Cosmos DB account.
Setting Value Description Subscription Subscription name Select the Azure subscription that you want to use for this Azure Cosmos DB account. Resource Group Resource group name Select a resource group, or select Create new, then enter a unique name for the new resource group. Account Name Enter a unique name Enter a unique name to identify your Azure Cosmos DB account. Your account URI will be gremlin.azure.com appended to your unique account name.
The account name can use only lowercase letters, numbers, and hyphens (-), and must be between 3 and 44 characters long.
API Gremlin (graph) The API determines the type of account to create. Azure Cosmos DB provides five APIs: NoSQL for document databases, Gremlin for graph databases, MongoDB for document databases, Azure Table, and Cassandra. You must create a separate account for each API.
Select Gremlin (graph), because in this quickstart you are creating a table that works with the API for Gremlin.
Learn more about the API for Gremlin.
Location The region closest to your users Select a geographic location to host your Azure Cosmos DB account. Use the location that is closest to your users to give them the fastest access to the data. Capacity mode Provisioned throughput or Serverless Select Provisioned throughput to create an account in provisioned throughput mode. Select Serverless to create an account in serverless mode. Apply Azure Cosmos DB free tier discount Apply or Do not apply With Azure Cosmos DB free tier, you will get the first 1000 RU/s and 25 GB of storage for free in an account. Learn more about free tier.
You can have up to one free tier Azure Cosmos DB account per Azure subscription and must opt-in when creating the account. If you do not see the option to apply the free tier discount, this means another account in the subscription has already been enabled with free tier.
In the Global Distribution tab, configure the following details. You can leave the default values for the purpose of this quickstart:
Setting Value Description Geo-Redundancy Disable Enable or disable global distribution on your account by pairing your region with a pair region. You can add more regions to your account later. Multi-region Writes Disable Multi-region writes capability allows you to take advantage of the provisioned throughput for your databases and containers across the globe.
The following options are not available if you select Serverless as the Capacity mode:
- Apply Free Tier Discount
- Multi-region Writes
Optionally you can configure additional details in the following tabs:
- Networking - Configure access from a virtual network.
- Backup Policy - Configure either periodic or continuous backup policy.
- Encryption - Use either service-managed key or a customer-managed key.
- Tags - Tags are name/value pairs that enable you to categorize resources and view consolidated billing by applying the same tag to multiple resources and resource groups.
Select Review + create.
The account creation takes a few minutes. Wait for the portal to display the Congratulations! Your Azure Cosmos DB account was created page.
Add a graph
You can now use the Data Explorer tool in the Azure portal to create a graph database.
Select Data Explorer > New Graph.
The Add Graph area is displayed on the far right, you may need to scroll right to see it.
In the Add graph page, enter the settings for the new graph.
Setting Suggested value Description Database ID sample-database Enter sample-database as the name for the new database. Database names must be between 1 and 255 characters, and can't contain
/ \ # ?or a trailing space.
Throughput 400 RUs Change the throughput to 400 request units per second (RU/s). If you want to reduce latency, you can scale up the throughput later. If you chose serverless capacity mode, then throughput isn't required. Graph ID sample-graph Enter sample-graph as the name for your new collection. Graph names have the same character requirements as database IDs. Partition Key /pk All Azure Cosmos DB accounts need a partition key to horizontally scale. Learn how to select an appropriate partition key in the Graph Data Partitioning article.
Once the form is filled out, select OK.
Clone the sample application
Now let's switch to working with code. Let's clone a Gremlin API app from GitHub, set the connection string, and run it. You'll see how easy it's to work with data programmatically.
Run the following command to clone the sample repository to your local machine. This command creates a copy of the sample app on your computer. Start at in the root of the folder where you typically store GitHub repositories.
git clone https://github.com/Azure-Samples/azure-cosmos-db-graph-python-getting-started.git
Change to the directory where the sample app is located.
Review the code
This step is optional. If you're interested in learning how the database resources are created in the code, you can review the following snippets. The snippets are all taken from the connect.py file in the repo git-samples\azure-cosmos-db-graph-python-getting-started. Otherwise, you can skip ahead to Update your connection string.
clientis initialized in connect.py with
client.Client(). Make sure to replace
<YOUR_CONTAINER_OR_GRAPH>with the values of your account's database name and graph name:
... client = client.Client('wss://<YOUR_ENDPOINT>.gremlin.cosmosdb.azure.com:443/','g', username="/dbs/<YOUR_DATABASE>/colls/<YOUR_CONTAINER_OR_GRAPH>", password="<YOUR_PASSWORD>") ...
A series of Gremlin steps (queries) are declared at the beginning of the connect.py file. They're then executed using the
client.submitAsync()method. For example, to run the cleanup graph step, you'd use the following code:
Update your connection information
Now go back to the Azure portal to get your connection information and copy it into the app. These settings enable your app to communicate with your hosted database.
In your Azure Cosmos DB account in the Azure portal, select Keys.
Copy the first portion of the URI value.
Open the connect.py file, find the
client.Client()definition, and paste the URI value over
client = client.Client('wss://<YOUR_ENDPOINT>.gremlin.cosmosdb.azure.com:443/','g', username="/dbs/<YOUR_DATABASE>/colls/<YOUR_COLLECTION_OR_GRAPH>", password="<YOUR_PASSWORD>")
The URI portion of the client object should now look similar to this code:
client = client.Client('wss://test.gremlin.cosmosdb.azure.com:443/','g', username="/dbs/<YOUR_DATABASE>/colls/<YOUR_COLLECTION_OR_GRAPH>", password="<YOUR_PASSWORD>")
Change the second parameter of the
clientobject to replace the
<YOUR_COLLECTION_OR_GRAPH>strings. If you used the suggested values, the parameter should look like this code:
clientobject should now look like this code:
client = client.Client('wss://test.gremlin.cosmosdb.azure.com:443/','g', username="/dbs/sample-database/colls/sample-graph", password="<YOUR_PASSWORD>")
On the Keys page, use the copy button to copy the PRIMARY KEY and paste it over
clientobject definition should now look similar to the following:
client = client.Client('wss://test.gremlin.cosmosdb.azure.com:443/','g', username="/dbs/sample-database/colls/sample-graph", password="asdb13Fadsf14FASc22Ggkr662ifxz2Mg==")
Save the connect.py file.
Run the console app
Start in a terminal window in the root of the folder where you cloned the sample app. If you are using Visual Studio Code, you can open a terminal window by selecting Terminal > New Terminal. Typically, you'll create a virtual environment to run the code. For more information, see Python virtual environments.
Install the required Python packages.
pip install -r requirements.txt
Start the Python application.
The terminal window displays the vertices and edges being added to the graph.
If you experience timeout errors, check that you updated the connection information correctly in Update your connection information, and also try running the last command again.
Once the program stops, press Enter, then switch back to the Azure portal in your internet browser.
Review and add sample data
After the vertices and edges are inserted, you can now go back to Data Explorer and see the vertices added to the graph, and add more data points.
In your Azure Cosmos DB account in the Azure portal, select Data Explorer, expand sample-database, expand sample-graph, select Graph, and then select Execute Gremlin Query.
In the Results list, notice three new users are added to the graph. You can move the vertices around by dragging and dropping, zoom in and out by scrolling the wheel of your mouse, and expand the size of the graph with the double-arrow.
Let's add a few new users. Select the New Vertex button to add data to your graph.
Enter a label of person.
Select Add property to add each of the following properties. Notice that you can create unique properties for each person in your graph. Only the ID key is required.
key value Notes pk /pk id ashley The unique identifier for the vertex. If you don't specify an ID, one is generated for you. gender female tech java
In this quickstart create a non-partitioned collection. However, if you create a partitioned collection by specifying a partition key during the collection creation, then you need to include the partition key as a key in each new vertex.
Select OK. You may need to expand your screen to see OK on the bottom of the screen.
Select New Vertex again and add another new user.
Enter a label of person.
Select Add property to add each of the following properties:
key value Notes pk /pk id rakesh The unique identifier for the vertex. If you don't specify an ID, one is generated for you. gender male school MIT
Select the Execute Gremlin Query button with the default
g.V()filter to display all the values in the graph. All of the users now show in the Results list.
As you add more data, you can use filters to limit your results. By default, Data Explorer uses
g.V()to retrieve all vertices in a graph. You can change it to a different graph query, such as
g.V().count(), to return a count of all the vertices in the graph in JSON format. If you changed the filter, change the filter back to
g.V()and select Execute Gremlin Query to display all the results again.
Now we can connect rakesh and ashley. Ensure ashley is selected in the Results list, then select the edit button next to Targets on lower right side. You may need to widen your window to see the Properties area.
In the Target box type rakesh, and in the Edge label box type knows, and then select the check.
Now select rakesh from the results list and see that ashley and rakesh are connected.
That completes the resource creation part of this tutorial. You can continue to add vertexes to your graph, modify the existing vertexes, or change the queries. Now let's review the metrics Azure Cosmos DB provides, and then clean up the resources.
Review SLAs in the Azure portal
The Azure portal monitors your Azure Cosmos DB account throughput, storage, availability, latency, and consistency. Charts for metrics associated with an Azure Cosmos DB Service Level Agreement (SLA) show the SLA value compared to actual performance. This suite of metrics makes monitoring your SLAs transparent.
To review metrics and SLAs:
Select Metrics in your Azure Cosmos DB account's navigation menu.
Select a tab such as Latency, and select a timeframe on the right. Compare the Actual and SLA lines on the charts.
Review the metrics on the other tabs.
Clean up resources
When you're done with your app and Azure Cosmos DB account, you can delete the Azure resources you created so you don't incur more charges. To delete the resources:
In the Azure portal Search bar, search for and select Resource groups.
From the list, select the resource group you created for this quickstart.
On the resource group Overview page, select Delete resource group.
In the next window, enter the name of the resource group to delete, and then select Delete.
In this quickstart, you learned how to create an Azure Cosmos DB account, create a graph using the Data Explorer, and run a Python app to add data to the graph. You can now build more complex queries and implement powerful graph traversal logic using Gremlin.
Submit and view feedback for