Vector Search filtering guide

This page describes the filter expression syntax for Vector Search queries. The filter syntax is different for standard and storage-optimized endpoints.

  • Standard endpoints use a Python dictionary, where the key encodes both the column name and the operator (for example, {"price <": 200}).
  • Storage-optimized endpoints use a SQL-like filter string similar to a WHERE clause (for example, "price < 200").

For a complete reference of supported operators and where to find filters in the query API, see Use filters on queries. For example notebooks, see Example notebooks.

Standard endpoints

Standard endpoints accept a Python dictionary passed to similarity_search() through the filters parameter. The key encodes both the column name and the operator (for example, {"price >": 40000}), and multiple keys in the same dict are combined with AND logic. This section summarizes the supported operators and known limitations.

Quick reference

String columns

Operator Syntax Example
Exact match {"col": "val"} {"make": "Toyota"}
Negation {"col NOT": "val"} {"make NOT": "Ford"}
OR (multiple values) {"col": ["v1","v2"]} {"make": ["Toyota","Honda"]}
Token-based LIKE {"col LIKE": "token"} {"color LIKE": "red"}
Hyphenated values {"col": "F-150"} {"model": "F-150"}
JSON pattern LIKE [{"col LIKE": "%pattern%"}, {"col2 op": n}] [{"specs LIKE": '%"drivetrain":"AWD"%'}, {"price <": 50000}]

Numeric columns (INT, DOUBLE)

Operator Syntax Example
Greater than {"col >": n} {"price >": 40000}
Less than or equal {"col <=": n} {"price <=": 55000}
Greater than or equal {"col >=": n} {"rating >=": 4.5}
Integer match {"col": n} {"year": 2024}
Range Two keys in dict {"price >=": 30000, "price <=": 55000}

Boolean columns

Operator Syntax Example
Match true {"col": true} {"in_stock": true}
Match false {"col": false} {"in_stock": false}

Array columns

Standard endpoints support primitive ARRAY types: ARRAY<STRING>, ARRAY<INT>, ARRAY<BIGINT>, ARRAY<SMALLINT>, ARRAY<TINYINT>, ARRAY<FLOAT>, ARRAY<DOUBLE>, ARRAY<BOOLEAN>, ARRAY<DATE>, and ARRAY<TIMESTAMP>. ARRAY<STRUCT> is not supported.

Operator Syntax Example
Contains a value {"col": "val"} {"body_type": "sedan"}
Contains any value (OR) {"col": ["v1","v2"]} {"body_type": ["hybrid","electric"]}
AND with another column {"col": ["v1","v2"], "col2": "val"} {"body_type": ["hybrid", "electric"], "make": "BMW"}

Timestamp columns (native TIMESTAMP or DATE)

Operator Syntax Example
After {"col >": "ISO8601Z"} {"listed_at >": "2024-01-01T00:00:00Z"}
Exact timestamp match {"col": "ISO8601Z"} {"listed_at": "2024-01-10T09:15:00Z"}
Negation {"col NOT": "ISO8601Z"} {"listed_at NOT": "2024-01-10T09:15:00Z"}
Range Two keys in dict {"listed_at >=": "2024-01-01T00:00:00Z", "listed_at <": "2024-04-01T00:00:00Z"}

Combined filters

Multiple keys in a single dict are combined with AND logic. Use the {"col1 OR col2 op": [v1, v2]} syntax for OR across different fields.

Pattern Syntax Example
String + numeric Multiple keys {"make": "BMW", "price >": 60000}
Numeric + numeric Multiple keys {"price >": 50000, "rating >=": 4.7}
String + numeric + array Multiple keys {"make": ["Tesla", "Toyota"], "price <=": 55000, "body_type": "electric"}
String + array + numeric Multiple keys {"body_type": ["hybrid", "electric"], "price <": 70000, "year": 2024}
String + numeric + timestamp Multiple keys {"make": "Toyota", "price <=": 40000, "listed_at >=": "2024-01-01T00:00:00Z"}
OR across fields Combined key {"make OR price <=": ["Tesla", 30000]}
JSON LIKE + numeric List of dicts [{"specs LIKE": '%"drivetrain":"AWD"%'}, {"price <": 50000}]
AND on the same field List of dicts [{"make_model_year LIKE": "%Tesla%"}, {"make_model_year LIKE": "%2024%"}]

Limitations

Limitation Detail Workaround
ARRAY<struct> not supported Creating an index with ARRAY<struct> columns raises BadRequest: Invalid column type in schema. Supported field types include array<float>, array<tinyint>, array<double>, timestamp, tinyint, float, smallint, array<date>, string, array<boolean>, array<smallint>, double, boolean, date, int, array<string>, array<bigint>, array<timestamp>, bigint, and array<int>. Unflatten the struct into a string column for indexing and filtering.
LIKE is token-based only Matches whole whitespace-separated tokens, not SQL wildcard patterns (%, _). Use a storage-optimized endpoint for wildcard LIKE.
STRING date columns not supported Using >, <, >=, or <= on STRING columns raises BadRequest: Please use a numeric value. Use a native TIMESTAMP column with ISO 8601 values, or store dates as epoch milliseconds.
BETWEEN not supported There is no BETWEEN operator. Use two keys in the dict, for example {"price >=": 30000, "price <=": 55000}.
No SQL functions in filters Functions like to_timestamp() are not supported in dict filters. Use a storage-optimized endpoint with a SQL filter string. For timestamps, store as epoch milliseconds.
JSON numeric filtering not supported Standard endpoints cannot extract or cast nested JSON values (for example, specs.hp >, CAST(), or get_json_object()). Use LIKE patterns on the JSON string, or pre-extract the value as a top-level INT column.
Duplicate dict keys silently dropped Python keeps only the last value for duplicate keys in a dict. For example, {"make_model_year LIKE": "%Tesla%", "make_model_year LIKE": "%2024%"} only applies the second filter. Use a list of dicts: [{"make_model_year LIKE": "%Tesla%"}, {"make_model_year LIKE": "%2024%"}].

Storage-optimized endpoints

Storage-optimized endpoints accept a SQL-like filter string passed to similarity_search() through the filters parameter. This section summarizes the supported operators and known limitations.

Quick reference

String columns

Operator Syntax Example
Exact match col = 'val' make = 'Toyota'
Negation col != 'val' make != 'Ford'
OR (multiple values) col IN ('v1','v2') make IN ('Toyota','Honda')
Wildcard pattern col LIKE 'pat%' color LIKE 'bl%'
Hyphenated values col = 'val-ue' model = 'F-150'
JSON substring match col LIKE '%"k":v%' specs LIKE '%"hp":4%'

Numeric columns (INT, DOUBLE)

Operator Syntax Example
Greater than col > n price > 40000
Less than or equal col <= n price <= 25000
Greater than or equal col >= n rating >= 4.7
Integer match col = n year = 2024
Range col >= a AND col <= b price >= 30000 AND price <= 55000

Boolean columns

Operator Syntax Example
Boolean true col IS TRUE in_stock IS TRUE

Array columns

Array filtering is not supported on storage-optimized endpoints. ARRAY_CONTAINS raises a BadRequest: Syntax error. As a workaround, concatenate array values into a string column and use LIKE. For example:

  • "body_type LIKE '%sedan%'"
  • "body_type LIKE '%hybrid%' OR body_type LIKE '%electric%'"

Timestamp columns (native TIMESTAMP)

Operator Syntax Example
After date col > TO_TIMESTAMP('ISO8601') listed_at > TO_TIMESTAMP('2024-03-01T00:00:00')
Range Combine with AND listed_at >= TO_TIMESTAMP('2024-01-01T00:00:00') AND listed_at < TO_TIMESTAMP('2024-04-01T00:00:00')

Combined filters

Pattern Syntax Example
String + numeric A AND B make = 'BMW' AND price > 60000
Numeric + numeric A AND B price > 50000 AND rating >= 4.7
String + numeric + IN A AND B AND C make IN ('Tesla', 'Toyota') AND price <= 55000 AND year >= 2022
OR across fields A OR (B AND C) make = 'Tesla' OR (make = 'BMW' AND price > 60000)
Timestamp + numeric + string A AND B AND C listed_at >= TO_TIMESTAMP('2024-01-01T00:00:00') AND price < 40000 AND make = 'Toyota'

Limitations

Limitation Detail Workaround
ARRAY<struct> not supported Creating an index with ARRAY<struct> columns raises BadRequest: Invalid column type in schema. Supported field types include array<float>, array<tinyint>, array<double>, timestamp, tinyint, float, smallint, array<date>, string, array<boolean>, array<smallint>, double, boolean, date, int, array<string>, array<bigint>, array<timestamp>, bigint, and array<int>. Unflatten the struct into a string column for indexing and filtering.
ARRAY_CONTAINS not supported ARRAY_CONTAINS(col, 'val') raises BadRequest: Syntax error. Array filtering is not supported in storage-optimized filter strings. Concatenate array values into a string column and use LIKE.
BETWEEN not supported Using BETWEEN raises BadRequest: no viable alternative at input 'priceBETWEEN'. Use price >= a AND price <= b.
Bare timestamp strings cause type mismatch listed_at > '2024-03-01T00:00:00' raises BadRequest: Cannot compare timestamp[us] with STRING_LITERAL. Wrap the value with TO_TIMESTAMP(), for example listed_at > TO_TIMESTAMP('2024-03-01T00:00:00'). The column must be a native TIMESTAMP type.
JSON numeric filtering not supported SQL functions like CAST(get_json_object(specs, '$.hp') AS INT) > 300 raise BadRequest: Syntax error. Expression-based filters are not supported in storage-optimized filter strings. Pre-extract JSON fields into top-level columns at index creation time, or use LIKE pattern matching (for example, specs LIKE '%"hp":4%' OR specs LIKE '%"hp":5%' to approximate hp values of 400-599).
Post-filtering (overfetch) Results are ranked by relevance first, then filtered. Most cases are handled automatically, but in rare scenarios—mainly LIKE '%...%' filters on more than 40 indexes with low num_results—matching documents with low relevance scores might not appear even if they satisfy the filter. Increase num_results to widen the candidate pool.

Example notebooks

Standard endpoint setup and filtering notebook

Get notebook

Storage-optimized endpoint setup and filtering notebook

Get notebook