OData language overview for $filter
, $orderby
, and $select
in Azure AI Search
This article provides an overview of the OData expression language used in $filter
, $order-by
, and $select
expressions for keyword search in Azure AI Search over numeric and string (nonvector) fields.
The language is presented "bottom-up" starting with the most basic elements. The OData expressions that you can construct in a query request range from simple to highly complex, but they all share common elements. Shared elements include:
- Field paths, which refer to specific fields of your index.
- Constants, which are literal values of a certain data type.
Once you understand these common concepts, you can continue with the top-level syntax for each expression:
- $filter expressions are evaluated during query parsing, constraining search to specific fields or adding match criteria used during index scans.
- $orderby expressions are applied as a post-processing step over a result set to sort the documents that are returned.
- $select expressions determine which document fields are included in the result set.
The syntax of these expressions is distinct from the simple or full query syntax used in the search parameter, although there's some overlap in the syntax for referencing fields.
For examples in other languages such as Python or C#, see the examples in the azure-search-vector-samples repository.
Note
Terminology in Azure AI Search differs from the OData standard in a few ways. What we call a field in Azure AI Search is called a property in OData, and similarly for field path versus property path. An index containing documents in Azure AI Search is referred to more generally in OData as an entity set containing entities. The Azure AI Search terminology is used throughout this reference.
Field paths
The following EBNF (Extended Backus-Naur Form) defines the grammar of field paths.
field_path ::= identifier('/'identifier)*
identifier ::= [a-zA-Z_][a-zA-Z_0-9]*
An interactive syntax diagram is also available:
Note
See OData expression syntax reference for Azure AI Search for the complete EBNF.
A field path is composed of one or more identifiers separated by slashes. Each identifier is a sequence of characters that must start with an ASCII letter or underscore, and contain only ASCII letters, digits, or underscores. The letters can be upper- or lower-case.
An identifier can refer either to the name of a field, or to a range variable in the context of a collection expression (any
or all
) in a filter. A range variable is like a loop variable that represents the current element of the collection. For complex collections, that variable represents an object, which is why you can use field paths to refer to subfields of the variable. This is analogous to dot notation in many programming languages.
Examples of field paths are shown in the following table:
Field path | Description |
---|---|
HotelName |
Refers to a top-level field of the index |
Address/City |
Refers to the City subfield of a complex field in the index; Address is of type Edm.ComplexType in this example |
Rooms/Type |
Refers to the Type subfield of a complex collection field in the index; Rooms is of type Collection(Edm.ComplexType) in this example |
Stores/Address/Country |
Refers to the Country subfield of the Address subfield of a complex collection field in the index; Stores is of type Collection(Edm.ComplexType) and Address is of type Edm.ComplexType in this example |
room/Type |
Refers to the Type subfield of the room range variable, for example in the filter expression Rooms/any(room: room/Type eq 'deluxe') |
store/Address/Country |
Refers to the Country subfield of the Address subfield of the store range variable, for example in the filter expression Stores/any(store: store/Address/Country eq 'Canada') |
The meaning of a field path differs depending on the context. In filters, a field path refers to the value of a single instance of a field in the current document. In other contexts, such as $orderby, $select, or in fielded search in the full Lucene syntax, a field path refers to the field itself. This difference has some consequences for how you use field paths in filters.
Consider the field path Address/City
. In a filter, this refers to a single city for the current document, like "San Francisco". In contrast, Rooms/Type
refers to the Type
subfield for many rooms (like "standard" for the first room, "deluxe" for the second room, and so on). Since Rooms/Type
doesn't refer to a single instance of the subfield Type
, it can't be used directly in a filter. Instead, to filter on room type, you would use a lambda expression with a range variable, like this:
Rooms/any(room: room/Type eq 'deluxe')
In this example, the range variable room
appears in the room/Type
field path. That way, room/Type
refers to the type of the current room in the current document. This is a single instance of the Type
subfield, so it can be used directly in the filter.
Using field paths
Field paths are used in many parameters of the Azure AI Search REST APIs. The following table lists all the places where they can be used, plus any restrictions on their usage:
API | Parameter name | Restrictions |
---|---|---|
Create or Update Index | suggesters/sourceFields |
None |
Create or Update Index | scoringProfiles/text/weights |
Can only refer to searchable fields |
Create or Update Index | scoringProfiles/functions/fieldName |
Can only refer to filterable fields |
Search | search when queryType is full |
Can only refer to searchable fields |
Search | facet |
Can only refer to facetable fields |
Search | highlight |
Can only refer to searchable fields |
Search | searchFields |
Can only refer to searchable fields |
Suggest and Autocomplete | searchFields |
Can only refer to fields that are part of a suggester |
Search, Suggest, and Autocomplete | $filter |
Can only refer to filterable fields |
Search and Suggest | $orderby |
Can only refer to sortable fields |
Search, Suggest, and Lookup | $select |
Can only refer to retrievable fields |
Constants
Constants in OData are literal values of a given Entity Data Model (EDM) type. See Supported data types for a list of supported types in Azure AI Search. Constants of collection types aren't supported.
The following table shows examples of constants for each of the nonvector data types that support OData expressions:
Data type | Example constants |
---|---|
Edm.Boolean |
true , false |
Edm.DateTimeOffset |
2019-05-06T12:30:05.451Z |
Edm.Double |
3.14159 , -1.2e7 , NaN , INF , -INF |
Edm.GeographyPoint |
geography'POINT(-122.131577 47.678581)' |
Edm.GeographyPolygon |
geography'POLYGON((-122.031577 47.578581, -122.031577 47.678581, -122.131577 47.678581, -122.031577 47.578581))' |
Edm.Int32 |
123 , -456 |
Edm.Int64 |
283032927235 |
Edm.String |
'hello' |
Escaping special characters in string constants
String constants in OData are delimited by single quotes. If you need to construct a query with a string constant that might itself contain single quotes, you can escape the embedded quotes by doubling them.
For example, a phrase with an unformatted apostrophe like "Alice's car" would be represented in OData as the string constant 'Alice''s car'
.
Important
When constructing filters programmatically, it's important to remember to escape string constants that come from user input. This is to mitigate the possibility of injection attacks, especially when using filters to implement security trimming.
Constants syntax
The following EBNF (Extended Backus-Naur Form) defines the grammar for most of the constants shown in the above table. The grammar for geo-spatial types can be found in OData geo-spatial functions in Azure AI Search.
constant ::=
string_literal
| date_time_offset_literal
| integer_literal
| float_literal
| boolean_literal
| 'null'
string_literal ::= "'"([^'] | "''")*"'"
date_time_offset_literal ::= date_part'T'time_part time_zone
date_part ::= year'-'month'-'day
time_part ::= hour':'minute(':'second('.'fractional_seconds)?)?
zero_to_fifty_nine ::= [0-5]digit
digit ::= [0-9]
year ::= digit digit digit digit
month ::= '0'[1-9] | '1'[0-2]
day ::= '0'[1-9] | [1-2]digit | '3'[0-1]
hour ::= [0-1]digit | '2'[0-3]
minute ::= zero_to_fifty_nine
second ::= zero_to_fifty_nine
fractional_seconds ::= integer_literal
time_zone ::= 'Z' | sign hour':'minute
sign ::= '+' | '-'
/* In practice integer literals are limited in length to the precision of
the corresponding EDM data type. */
integer_literal ::= digit+
float_literal ::=
sign? whole_part fractional_part? exponent?
| 'NaN'
| '-INF'
| 'INF'
whole_part ::= integer_literal
fractional_part ::= '.'integer_literal
exponent ::= 'e' sign? integer_literal
boolean_literal ::= 'true' | 'false'
An interactive syntax diagram is also available:
Note
See OData expression syntax reference for Azure AI Search for the complete EBNF.
Building expressions from field paths and constants
Field paths and constants are the most basic part of an OData expression, but they're already full expressions themselves. In fact, the $select parameter in Azure AI Search is nothing but a comma-separated list of field paths, and $orderby isn't much more complicated than $select. If you happen to have a field of type Edm.Boolean
in your index, you can even write a filter that is nothing but the path of that field. The constants true
and false
are likewise valid filters.
However, it's more common to have complex expressions that refer to more than one field and constant. These expressions are built in different ways depending on the parameter.
The following EBNF (Extended Backus-Naur Form) defines the grammar for the $filter, $orderby, and $select parameters. These are built up from simpler expressions that refer to field paths and constants:
filter_expression ::= boolean_expression
order_by_expression ::= order_by_clause(',' order_by_clause)*
select_expression ::= '*' | field_path(',' field_path)*
An interactive syntax diagram is also available:
Note
See OData expression syntax reference for Azure AI Search for the complete EBNF.
Next steps
The $orderby and $select parameters are both comma-separated lists of simpler expressions. The $filter parameter is a Boolean expression that is composed of simpler subexpressions. These subexpressions are combined using logical operators such as and
, or
, and not
, comparison operators such as eq
, lt
, gt
, and so on, and collection operators such as any
and all
.
The $filter, $orderby, and $select parameters are explored in more detail in the following articles: