Our Dataset
Our data set is a flat list of attributes identified by a unique id. Overall we have ~130 attributes. There is an example record in the Appendix for reference.
We have over 52000 unique ids.
- We want to achieve the following as part of our search request
- Be able to filter by a single attribute or a combination of attributes using AND/OR criteria
- Group the results using a combination of multiple fields (4 in our case)
- Calculate aggregated values for a subset of fields as part of grouping. We are only interested in using max/min functions for grouping. e.g. max value of a field value
- Include the aggregated values at the group level in the response
- Have the records that constitute the group as a nested array within the group record
- Order the grouped records based on the value of the aggregated fields that were created as part of grouping.
- Return the grouped and ordered records including aggregated fields and nested array of constituent records using pagination.
- We need to be able to select the fields returned in the records.
- Please note that the aggregated values at the group level cannot be pre calculated as they are dependent on the filter criteria.
- We need to achieve this as part of a REST API response. i.e. search results need to be displayed to an end user with maximum tolerable delay ~1-2 secs.
Please advice if this can be achieved using Azure Cognitive Search product.
Appendix
Sampel Record
"document": {
"vehicle_uid": "8137116e411c9e6edffdd86bd50cfdd8e9cd6af",
"search_text": "Porsche Taycan 79kWh 4S sedan",
"vehicle_status": "discontinued",
"vehicle_status_correct_case": null,
"hero_image_url": "https://caradvisoryinternal.blob.core.windows.net/ca-media/images/190654/494d5a/1/0012.png",
"limited_edition": "null",
"year_low": 2022,
"year_high": 2022,
"segment": "luxury",
"vehicle_make": "porsche",
"vehicle_model": "taycan",
"vehicle_trim": "4s",
"vehicle_trim_group": "4staycanporsche",
....
....
~130 attributes in total
}
Expected Response Record
Grouping is required based on trim_status_id field
group_*fields need to be calculated dynamically per search request, after application of filter criteria.
documents in results need to be ordered using one or more group_* fields
variants array inside each document is made up of the records that were grouped together.
{
"results": [
{
"document": {
"id": "dG95b3RhX2hpbHV4X3RyZF9kaXNjb250aW51ZWQ=",
"vehicle_make": "toyota",
"vehicle_model": "hilux",
"vehicle_trim": "trd",
"vehicle_make_correct_case": "Toyota",
"vehicle_model_correct_case": "Hilux",
"vehicle_trim_correct_case": "TRD",
"trim_status_id": "toyota_hilux_trd_discontinued",
"vehicle_status": "discontinued",
"group_year_high": 2017, >>> Example of dynamically aggregated field calculated after application of filter criteria
"group_year_low": 2017,
"group_fuel_score_high": 0,
"group_fuel_score_low": 0,
"group_environment_score_high": 1,
"group_environment_score_low": 1,
....
....
//additional grouped fields ~40. in number
"variants": [
{
"hero_image_url": "https://caradvisoryinternal.blob.core.windows.net/ca-media/images/1100103/ffffff/1/0012.png",
"vehicle_uid": "78253654bf7e8c8ebdc28c04142d50ed405491c",
"year_low": 2017,
"year_high": 2017,
"vehicle_version_correct_case": "2.8 T/Diesel TRD (White) Double Cab 4X4",
"environment_score": 1,
"driven_wheels": "4x4",
"vehicle_seats": 5,
"vehicle_doors": 4,
"fuel_type": "diesel",
"fuel_score": 0,
"retail_price": 58990,
....
....
//fields selected in search query
},
....
....
]
}