How to use order by on the HighWaterMarkChangeDetectionPolicy field with the search service API

Julien Ferreira 40 Reputation points
2023-03-30T06:40:16.76+00:00

Hello,

I'm currently work on solution of azure cognitive search with multiple cosmos db indexers.

I use .NET SDK to configure my indexes and indexers.

I set up the HighWaterMarkChangeDetectionPolicy on the _ts filed of my cosmos documents.

I set up the assumeOrderByHighWaterMarkColumn configuration to true like described in this section.

https://learn.microsoft.com/en-us/azure/search/search-howto-index-cosmosdb#incremental-indexing-and-custom-queries

But it remains an incomprehension on the finality. In this section give an example with a SQL like query but in my case I would like to use the search service API with a post request with the body:

{
	"count":true,
	"filter":"type eq 'collection'",
	"queryType":"full",
	"orderby":"_ts",
	"select":"id,collectionName,coverThumbnailURL,type,artists",
	"skip":0,
	"top":10
}

I specify the orderby property with _ts field, but this call gives me this error:

{
	"error": {
		"code": "",
		"message": "Invalid expression: Could not find a property named '_ts' on type 'search.document'.\r\nParameter name: $orderby"
	}
}

I didn't understand how to use order by on the HighWaterMarkChangeDetectionPolicy field with the search service API.

Thanks in advance for your help.

Regards.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
865 questions
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,543 questions
{count} votes

Accepted answer
  1. Grmacjon-MSFT 17,456 Reputation points
    2023-03-31T00:03:09.77+00:00

    Hi @Julien Ferreira

    We are sorry to hear you're having difficulties using HighWaterMarkChangeDetectionPolicy field with the search service API

    Based on the error message you shared, it looks like the _ts field is not available in the search index schema, and therefore cannot be used as a sorting field in the search API call.

    To use the HighWaterMarkChangeDetectionPolicy in the search API, you can define a new field in your search index schema to store the timestamp value of the HighWaterMarkChangeDetectionPolicy field from Cosmos DB. You can then use this field for sorting in the search API call.

    Here's an example of how to modify your search index schema to include a new field for the HighWaterMarkChangeDetectionPolicy value:

    {
      "name": "your-search-index",
      "fields": [
        {"name": "id", "type": "Edm.String", "key": true, "searchable": false},
        {"name": "collectionName", "type": "Edm.String", "searchable": true},
        {"name": "coverThumbnailURL", "type": "Edm.String", "searchable": true},
        {"name": "type", "type": "Edm.String", "searchable": true},
        {"name": "artists", "type": "Collection(Edm.String)", "searchable": true},
        {"name": "highWaterMarkTimestamp", "type": "Edm.DateTimeOffset", "searchable": false}
      ]
    }
    
    

    Next, you can update your indexing code to extract the HighWaterMarkChangeDetectionPolicy value and store it in the highWaterMarkTimestamp field of the search index document. Here's an example:

    using Microsoft.Azure.Search;
    using Microsoft.Azure.Search.Models;
    using Microsoft.Azure.Cosmos;
    using Newtonsoft.Json.Linq;
    
    // ...
    
    var cosmosClient = new CosmosClient(endpointUrl, primaryKey);
    var container = cosmosClient.GetContainer(databaseId, containerId);
    var searchClient = new SearchServiceClient(searchServiceName, new SearchCredentials(apiKey));
    var searchIndexClient = searchClient.Indexes.GetClient(searchIndexName);
    
    var query = "SELECT * FROM c WHERE c._ts > @highWaterMarkTimestamp";
    var parameters = new SqlParameterCollection();
    parameters.Add(new SqlParameter("@highWaterMarkTimestamp", highWaterMarkTimestamp));
    
    var iterator = container.GetItemQueryIterator<JObject>(
        new QueryDefinition(query).WithParameterCollection(parameters));
    
    while (iterator.HasMoreResults)
    {
        foreach (var item in await iterator.ReadNextAsync())
        {
            var searchDocument = new SearchDocument(item);
            searchDocument["highWaterMarkTimestamp"] = (DateTimeOffset)item["_ts"];
            var indexBatch = IndexBatch.MergeOrUpload(new[] { IndexAction.Upload(searchDocument) });
            await searchIndexClient.Documents.IndexAsync(indexBatch);
        }
    }
    
    // update highWaterMarkTimestamp value after indexing is complete
    var newHighWaterMarkTimestamp = // get latest _ts value from Cosmos DB
    
    

    Lastly, you can modify your search API query to sort by the new highWaterMarkTimestamp field, like this:

    {
        "count": true,
        "filter": "type eq 'collection'",
        "queryType": "full",
        "orderby": "highWaterMarkTimestamp",
        "select": "id,collectionName,coverThumbnailURL,type,artists",
        "skip": 0,
        "top": 10
    }
    
    

    This should allow you to use the HighWaterMarkChangeDetectionPolicy in conjunction with the search API in a way that enables you to perform incremental indexing with Cosmos DB.

    Hope that helps.

    -Grace


    If the information helped address your question, please Accept the answer. This will help us and also improve searchability for others in the community who might be researching similar information.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Julien Ferreira 40 Reputation points
    2023-04-12T12:42:24.2666667+00:00

    I add a field in the description of index :

                [SimpleField(IsSortable = true)]
                public int lastModified { get; set; }
    

    And I add a FieldMappings in my search indexer.

                SearchIndexer cosmosDbIndexer = new SearchIndexer(
                    name: ENVIRONEMENT + "-" + collection_name + "-indexer",
                    dataSourceName: cosmosDbDataSource.Name,
                    targetIndexName: indexName)
                {
                    Schedule = new IndexingSchedule(TimeSpan.FromHours(1)),
                    Parameters = parameters,
                    FieldMappings =
                    {
                        new FieldMapping("_ts") { TargetFieldName = "lastModified" }
                    }
                };
    

    I finaly can use lastModified field in the body of request on seach API

    {
    	"count":true,
    	"filter":"type eq 'collection'",
    	"queryType":"full",
    	"orderby":"lastModified",
    	"select":"id,collectionName,coverThumbnailURL,type,artists, lastModified",
    	"skip":0,
    	"top":10
    }
    

    Thank you, regards.