Gremlin count value is delayed from actual results when setting TTL

gmNetwrix 0 Reputation points
2024-09-06T16:01:01.6233333+00:00

Hi,

Running into an issue with TTL property in cosmos db and it being ignored.

Uploaded a collection of vertices and edges or a graph and using Gremlin for traversals.

The vertices and edges have specific filters that allow for 100s to 1000s to be deleted easily by setting ttl on them.

e.g. g.V().has(filter).property("ttl",1) and g.E().has(filter).property("ttl",1)

When running those queries the vertices and edges are deleted.

However, when running a count against them the count is delayed by a significant period of time.

e.g. g.V().has(filter).Count()

This also shows in Azure Data Studio when connecting to those same rows via a nosql query.

e.g. select count(1) from c where c.Label = "edgeLabel" and c.Filter = filter

When testing against a larger ttl e.g. 1000 it definitely sets the values but the count still fails after the removal occurs for minutes (longer for larger deletions)

It looks like its not including the ttl on counts despite "An item will no longer appear in query responses immediately after the TTL expires, even if it hasn't yet been permanently deleted from the container." - https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/time-to-live

Following gremlin setting matching: https://learn.microsoft.com/en-us/azure/cosmos-db/gremlin/access-system-properties

Thanks

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,902 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 33,071 Reputation points Volunteer Moderator
    2024-09-06T17:45:44.77+00:00

    Possible Reasons for Delayed Count Results

    • Background Cleanup Delay: While the item may no longer be accessible in queries after TTL expires, the underlying deletion from the storage engine happens asynchronously. This could lead to a situation where the Gremlin traversal or SQL query still reflects the expired items for a brief period.
    • Indexing Lag: Even though the item is no longer available for standard queries after TTL expiration, there may be a delay in updating the indexing system, which could lead to stale count results. This is a common cause for count queries to be delayed.
    • Gremlin Query Optimization: Gremlin traversals, like g.V().has(filter).count(), may not always behave exactly like direct Cosmos DB SQL queries. This could also explain the discrepancy between the behavior in Gremlin and NoSQL queries.
    • TTL in Cosmos DB SQL Queries: Similarly, the SQL query SELECT COUNT(1) FROM c WHERE c.Label = "edgeLabel" might not immediately reflect TTL-expired records due to the lag between logical and physical deletion of the items in Cosmos DB. This is particularly true when the TTL is small, and you're performing large-scale deletions.

    Suggestions to Resolve the Issue

    • Wait for Propagation: Given that Cosmos DB handles TTL expiration asynchronously, you may need to introduce a small wait time between setting TTL properties and executing count queries. This would give Cosmos DB enough time to process the TTL expiration across all nodes and update the count properly.
    • Manual Cleanup Option: If the delayed count poses a significant issue, you could manually remove the expired items instead of relying on TTL by using a g.V().has(filter).drop() or similar commands to ensure immediate removal and accurate count results.
    • Query for Non-Expired Items: Use a query that explicitly checks for non-expired items, such as ensuring a TTL value greater than the current time if available. For example, g.V().has(filter).has('ttl', gt(currentTime)).count().
    • Azure Support Contact: If the problem persists or significantly impacts your operations, it may be useful to raise a support ticket with Azure Cosmos DB, as there could be internal delays or optimizations specific to your cluster's settings.

    Let me know if you'd like more information or examples for any of these suggestions!


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.