Del via


Supported data types (Azure AI Search)

This article describes the data types supported by Azure AI Search. Fields and the values used in filter expressions are typed according to the Entity Data Model (EDM). Specifying an EDM data type is a requirement for field definition.

Note

If you're using indexers, see Data type map for indexers in Azure AI Search for more information about how indexers map source-specific data types to EDM data types in a search index.

EDM data types for vector fields

A vector field type must be valid for the output of your embedding model. For example, if you use text-embedding-ada-002, the output format is Float32 or Collection(Edm.Single). In this scenario, you can't assign an Int8 data type because casting from float to int primitives is prohibited. However, you can cast from Float32 to Float16 or (Collection(Edm.Half)).

Vector fields are an array of embeddings. In EDM, an array is a collection.

Data type Vector type Description Recommended use
Collection(Edm.Byte) Binary 1-bit unsigned binary. Generally available in Create or Update Index (2024-07-01). Supports integration with models that emit binary embeddings, such as Cohere's v3 binary embedding models. or custom quantization logic that emits 1-bit unsigned binary output. For fields of type Collection(Edm.Byte), see Index binary data for help with specifying the field definition and vector search algorithms for binary data.
Collection(Edm.Single) Float32 32-bit floating point. Generally available in Create or Update Index (2024-07-01). This data type is also supported in newer preview versions and in the stable version 2023-11-01. Default data type in Microsoft tools that create vector fields on your behalf. Strikes a balance between precision and efficiency. Most embedding models emit vectors as Float32.
Collection(Edm.Half) Float16 16-bit floating point with lower precision and range. Generally available in Create or Update Index (2024-07-01). Useful for scenarios where memory and computational efficiency are critical, and where sacrificing some precision is acceptable. Often leads to faster query times and reduced memory footprint compared to Float32, although with slightly reduced accuracy. You can assign a Float16 type to index Float32 embeddings as Float16. You can also use Float16 for embedding models or custom quantization processes that emit Float16 natively.
Collection(Edm.Int16) Int16 16-bit signed integer. Generally available in Create or Update Index (2024-07-01). Offers reduced memory footprint compared to Float32 and support for higher-precision quantization methods while still retaining sufficient precision for many applications. Suitable for cases where memory efficiency is important. Requires that you have custom quantization that outputs vectors as Int16.
Collection(Edm.SByte) Int8 8-bit signed integer. Generally available in Create or Update Index (2024-07-01). Provides significant memory and computational efficiency gains compared to Float32 or Float16. However, it likely requires supplemental techniques (like quantization and oversampling) to offset the reduction in precision and recall appropriately. Requires that you have custom quantization that outputs vectors as Int8.

EDM data types for nonvector fields

Data type Description
Edm.String Text data.
Edm.Boolean Contains true/false values.
Edm.Int32 32-bit integer values.
Edm.Int64 64-bit integer values.
Edm.Double Double-precision IEEE 754 floating-point values.
Edm.DateTimeOffset Date and time values represented in the OData V4 format: yyyy-MM-ddTHH:mm:ss.fffZ or yyyy-MM-ddTHH:mm:ss.fff[+|-]HH:mm. Precision of DateTimeOffset fields is limited to milliseconds. If you upload DateTimeOffset values with submillisecond precision, the value returned is rounded up to milliseconds (for example, 2024-04-15T10:30:09.7552052Z is returned as 2024-04-15T10:30:09.7550000Z). When you upload DateTimeOffset values with time zone information to your index, Azure AI Search normalizes these values to UTC. For example, 2024-01-13T14:03:00-08:00 is stored as 2024-01-13T22:03:00Z. If you need to store time zone information, add an extra field to your index.
Edm.GeographyPoint A point representing a geographic location on the globe. For request and response bodies, the representation of values of this type follows the GeoJSON "Point" type format. For URLs, OData uses a literal form based on the WKT standard. A point literal is constructed as geography'POINT(lon lat)'.
Edm.ComplexType Objects whose properties map to subfields that can be of any other supported data type. This type enables indexing of structured hierarchical data such as JSON. Objects in a field of type Edm.ComplexType can contain nested objects, but the level of nesting is limited. The limits are described in Service limits.
Collection(Edm.String) A list of strings.
Collection(Edm.Boolean) A list of boolean values.
Collection(Edm.Int32) A list of 32-bit integer values.
Collection(Edm.Int64) A list of 64-bit integer values.
Collection(Edm.Double) A list of double-precision numeric values.
Collection(Edm.DateTimeOffset) A list of date time values.
Collection(Edm.GeographyPoint) A list of points representing geographic locations.
Collection(Edm.ComplexType) A list of objects of type Edm.ComplexType. There's a limit on the maximum number of elements across all collections of type Edm.ComplexType in a document. See Service limits for details.

All of the above types are nullable, except for collections of primitive and complex types, for example, Collection(Edm.String). Nullable fields can be explicitly set to null. They're automatically set to null when omitted from a document that is uploaded to an Azure AI Search index. Collection fields are automatically set to empty ([] in JSON) when they're omitted from a document. Also, it isn't possible to store a null value in a collection field.

Unlike complex collections, there's no upper limit specifically on the number of items in a collection of primitive types, but the 16-MB upper limit on payload size applies to all parts of documents, including collections.

Geospatial data type used in filter expressions

In Azure AI Search, geospatial search is expressed as a filter.

Edm.GeographyPolygon is a polygon representing a geographic region on the globe. While this type can't be used in document fields, it can be used as an argument to the geo.intersects function. The literal form for URLs in OData is based on the WKT (Well-known text) and OGC's simple feature access standards. A polygon literal is constructed as geography'POLYGON((lon lat, lon lat, ...))'.

Important

Points in a polygon must be in counterclockwise order. Points in a polygon are interpreted in counterclockwise order, relative to the inside of the polygon. For example, a 4-point closed polygon around London would be -0.3°W 51.6°N [top left] , -0.3°W 51.4°N [bottom left], 0.1°E 51.4°N [bottom right], 0.1°E 51.6°N [top right], -0.3°W 51.6°N [starting point].

See also