Indexes - Create

Creates a new search index.

POST {endpoint}/indexes?api-version=2023-11-01

URI Parameters

Name In Required Type Description
endpoint
path True

string

The endpoint URL of the search service.

api-version
query True

string

Client Api Version.

Request Header

Name Required Type Description
x-ms-client-request-id

string

uuid

The tracking ID sent with the request to help with debugging.

Request Body

Name Required Type Description
fields True

SearchField[]

The fields of the index.

name True

string

The name of the index.

@odata.etag

string

The ETag of the index.

analyzers LexicalAnalyzer[]:

The analyzers for the index.

charFilters CharFilter[]:

The character filters for the index.

corsOptions

CorsOptions

Options to control Cross-Origin Resource Sharing (CORS) for the index.

defaultScoringProfile

string

The name of the scoring profile to use if none is specified in the query. If this property is not set and no scoring profile is specified in the query, then default scoring (tf-idf) will be used.

encryptionKey

SearchResourceEncryptionKey

A description of an encryption key that you create in Azure Key Vault. This key is used to provide an additional level of encryption-at-rest for your data when you want full assurance that no one, not even Microsoft, can decrypt your data. Once you have encrypted your data, it will always remain encrypted. The search service will ignore attempts to set this property to null. You can change this property as needed if you want to rotate your encryption key; Your data will be unaffected. Encryption with customer-managed keys is not available for free search services, and is only available for paid services created on or after January 1, 2019.

scoringProfiles

ScoringProfile[]

The scoring profiles for the index.

semantic

SemanticSettings

Defines parameters for a search index that influence semantic capabilities.

similarity Similarity:

The type of similarity algorithm to be used when scoring and ranking the documents matching a search query. The similarity algorithm can only be defined at index creation time and cannot be modified on existing indexes. If null, the ClassicSimilarity algorithm is used.

suggesters

Suggester[]

The suggesters for the index.

tokenFilters TokenFilter[]:

The token filters for the index.

tokenizers LexicalTokenizer[]:

The tokenizers for the index.

vectorSearch

VectorSearch

Contains configuration options related to vector search.

Responses

Name Type Description
201 Created

SearchIndex

Other Status Codes

SearchError

Error response.

Examples

SearchServiceCreateIndex

Sample Request

POST https://myservice.search.windows.net/indexes?api-version=2023-11-01

{
  "name": "hotels",
  "fields": [
    {
      "name": "hotelId",
      "type": "Edm.String",
      "key": true,
      "searchable": false
    },
    {
      "name": "baseRate",
      "type": "Edm.Double"
    },
    {
      "name": "description",
      "type": "Edm.String",
      "filterable": false,
      "sortable": false,
      "facetable": false
    },
    {
      "name": "descriptionEmbedding",
      "type": "Collection(Edm.Single)",
      "searchable": true,
      "filterable": false,
      "retrievable": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "synonymMaps": [],
      "dimensions": 1536,
      "vectorSearchProfile": "myHnswProfile"
    },
    {
      "name": "description_fr",
      "type": "Edm.String",
      "filterable": false,
      "sortable": false,
      "facetable": false,
      "analyzer": "fr.lucene"
    },
    {
      "name": "hotelName",
      "type": "Edm.String"
    },
    {
      "name": "category",
      "type": "Edm.String"
    },
    {
      "name": "tags",
      "type": "Collection(Edm.String)",
      "analyzer": "tagsAnalyzer"
    },
    {
      "name": "parkingIncluded",
      "type": "Edm.Boolean"
    },
    {
      "name": "smokingAllowed",
      "type": "Edm.Boolean"
    },
    {
      "name": "lastRenovationDate",
      "type": "Edm.DateTimeOffset"
    },
    {
      "name": "rating",
      "type": "Edm.Int32"
    },
    {
      "name": "location",
      "type": "Edm.GeographyPoint"
    }
  ],
  "scoringProfiles": [
    {
      "name": "geo",
      "text": {
        "weights": {
          "hotelName": 5
        }
      },
      "functions": [
        {
          "type": "distance",
          "boost": 5,
          "fieldName": "location",
          "interpolation": "logarithmic",
          "distance": {
            "referencePointParameter": "currentLocation",
            "boostingDistance": 10
          }
        }
      ]
    }
  ],
  "defaultScoringProfile": "geo",
  "suggesters": [
    {
      "name": "sg",
      "searchMode": "analyzingInfixMatching",
      "sourceFields": [
        "hotelName"
      ]
    }
  ],
  "analyzers": [
    {
      "name": "tagsAnalyzer",
      "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
      "charFilters": [
        "html_strip"
      ],
      "tokenizer": "standard_v2"
    }
  ],
  "corsOptions": {
    "allowedOrigins": [
      "tempuri.org"
    ],
    "maxAgeInSeconds": 60
  },
  "encryptionKey": {
    "keyVaultKeyName": "myUserManagedEncryptionKey-createdinAzureKeyVault",
    "keyVaultKeyVersion": "myKeyVersion-32charAlphaNumericString",
    "keyVaultUri": "https://myKeyVault.vault.azure.net",
    "accessCredentials": {
      "applicationId": "00000000-0000-0000-0000-000000000000",
      "applicationSecret": "<applicationSecret>"
    }
  },
  "similarity": {
    "@odata.type": "#Microsoft.Azure.Search.BM25Similarity",
    "b": 0.5,
    "k1": 1.3
  },
  "semantic": {
    "configurations": [
      {
        "name": "semanticHotels",
        "prioritizedFields": {
          "titleField": {
            "fieldName": "hotelName"
          },
          "prioritizedContentFields": [
            {
              "fieldName": "description"
            },
            {
              "fieldName": "description_fr"
            }
          ],
          "prioritizedKeywordsFields": [
            {
              "fieldName": "tags"
            },
            {
              "fieldName": "category"
            }
          ]
        }
      }
    ]
  },
  "vectorSearch": {
    "profiles": [
      {
        "name": "myHnswProfile",
        "algorithm": "myHnsw"
      },
      {
        "name": "myAlgorithm",
        "algorithm": "myExhaustive"
      }
    ],
    "algorithms": [
      {
        "name": "myHnsw",
        "kind": "hnsw",
        "hnswParameters": {
          "m": 4,
          "metric": "cosine"
        }
      },
      {
        "name": "myExhaustive",
        "kind": "exhaustiveKnn",
        "exhaustiveKnnParameters": {
          "metric": "cosine"
        }
      }
    ]
  }
}

Sample Response

{
  "name": "hotels",
  "fields": [
    {
      "name": "hotelId",
      "type": "Edm.String",
      "searchable": false,
      "filterable": true,
      "retrievable": true,
      "sortable": true,
      "facetable": true,
      "key": true,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "baseRate",
      "type": "Edm.Double",
      "searchable": false,
      "filterable": true,
      "retrievable": true,
      "sortable": true,
      "facetable": true,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "description",
      "type": "Edm.String",
      "searchable": true,
      "filterable": false,
      "retrievable": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "descriptionEmbedding",
      "type": "Collection(Edm.Single)",
      "searchable": true,
      "filterable": false,
      "retrievable": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": 1536,
      "vectorSearchProfile": "myHnswProfile",
      "synonymMaps": []
    },
    {
      "name": "description_fr",
      "type": "Edm.String",
      "searchable": true,
      "filterable": false,
      "retrievable": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": "fr.lucene",
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "hotelName",
      "type": "Edm.String",
      "searchable": true,
      "filterable": true,
      "retrievable": true,
      "sortable": true,
      "facetable": true,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "category",
      "type": "Edm.String",
      "searchable": true,
      "filterable": true,
      "retrievable": true,
      "sortable": true,
      "facetable": true,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "tags",
      "type": "Collection(Edm.String)",
      "searchable": true,
      "filterable": true,
      "retrievable": true,
      "sortable": false,
      "facetable": true,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": "tagsAnalyzer",
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "parkingIncluded",
      "type": "Edm.Boolean",
      "searchable": false,
      "filterable": true,
      "retrievable": true,
      "sortable": true,
      "facetable": true,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "smokingAllowed",
      "type": "Edm.Boolean",
      "searchable": false,
      "filterable": true,
      "retrievable": true,
      "sortable": true,
      "facetable": true,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "lastRenovationDate",
      "type": "Edm.DateTimeOffset",
      "searchable": false,
      "filterable": true,
      "retrievable": true,
      "sortable": true,
      "facetable": true,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "rating",
      "type": "Edm.Int32",
      "searchable": false,
      "filterable": true,
      "retrievable": true,
      "sortable": true,
      "facetable": true,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "location",
      "type": "Edm.GeographyPoint",
      "searchable": false,
      "filterable": true,
      "retrievable": true,
      "sortable": true,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    }
  ],
  "scoringProfiles": [
    {
      "name": "geo",
      "functionAggregation": "sum",
      "text": {
        "weights": {
          "hotelName": 5
        }
      },
      "functions": [
        {
          "fieldName": "location",
          "interpolation": "logarithmic",
          "type": "distance",
          "boost": 5,
          "distance": {
            "referencePointParameter": "currentLocation",
            "boostingDistance": 10
          }
        }
      ]
    }
  ],
  "defaultScoringProfile": "geo",
  "suggesters": [
    {
      "name": "sg",
      "searchMode": "analyzingInfixMatching",
      "sourceFields": [
        "hotelName"
      ]
    }
  ],
  "analyzers": [
    {
      "name": "tagsAnalyzer",
      "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
      "charFilters": [
        "html_strip"
      ],
      "tokenFilters": [],
      "tokenizer": "standard_v2"
    }
  ],
  "tokenizers": [],
  "tokenFilters": [],
  "charFilters": [],
  "corsOptions": {
    "allowedOrigins": [
      "tempuri.org"
    ],
    "maxAgeInSeconds": 60
  },
  "encryptionKey": {
    "keyVaultKeyName": "myUserManagedEncryptionKey-createdinAzureKeyVault",
    "keyVaultKeyVersion": "myKeyVersion-32charAlphaNumericString",
    "keyVaultUri": "https://myKeyVault.vault.azure.net",
    "accessCredentials": {
      "applicationId": "00000000-0000-0000-0000-000000000000",
      "applicationSecret": null
    }
  },
  "similarity": {
    "@odata.type": "#Microsoft.Azure.Search.BM25Similarity",
    "b": 0.5,
    "k1": 1.3
  },
  "semantic": {
    "configurations": [
      {
        "name": "semanticHotels",
        "prioritizedFields": {
          "titleField": {
            "fieldName": "hotelName"
          },
          "prioritizedContentFields": [
            {
              "fieldName": "description"
            },
            {
              "fieldName": "description_fr"
            }
          ],
          "prioritizedKeywordsFields": [
            {
              "fieldName": "tags"
            },
            {
              "fieldName": "category"
            }
          ]
        }
      }
    ]
  },
  "vectorSearch": {
    "algorithms": [
      {
        "name": "myHnsw",
        "kind": "hnsw",
        "hnswParameters": {
          "metric": "cosine",
          "m": 4,
          "efConstruction": 400,
          "efSearch": 500
        }
      },
      {
        "name": "myExhaustive",
        "kind": "exhaustiveKnn",
        "exhaustiveKnnParameters": {
          "metric": "cosine"
        }
      }
    ],
    "profiles": [
      {
        "name": "myHnswProfile",
        "algorithm": "myHnsw"
      },
      {
        "name": "myAlgorithm",
        "algorithm": "myExhaustive"
      }
    ]
  }
}

Definitions

Name Description
AsciiFoldingTokenFilter

Converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if such equivalents exist. This token filter is implemented using Apache Lucene.

AzureActiveDirectoryApplicationCredentials

Credentials of a registered application created for your search service, used for authenticated access to the encryption keys stored in Azure Key Vault.

BM25Similarity

Ranking function based on the Okapi BM25 similarity algorithm. BM25 is a TF-IDF-like algorithm that includes length normalization (controlled by the 'b' parameter) as well as term frequency saturation (controlled by the 'k1' parameter).

CharFilterName

Defines the names of all character filters supported by the search engine.

CjkBigramTokenFilter

Forms bigrams of CJK terms that are generated from the standard tokenizer. This token filter is implemented using Apache Lucene.

CjkBigramTokenFilterScripts

Scripts that can be ignored by CjkBigramTokenFilter.

ClassicSimilarity

Legacy similarity algorithm which uses the Lucene TFIDFSimilarity implementation of TF-IDF. This variation of TF-IDF introduces static document length normalization as well as coordinating factors that penalize documents that only partially match the searched queries.

ClassicTokenizer

Grammar-based tokenizer that is suitable for processing most European-language documents. This tokenizer is implemented using Apache Lucene.

CommonGramTokenFilter

Construct bigrams for frequently occurring terms while indexing. Single terms are still indexed too, with bigrams overlaid. This token filter is implemented using Apache Lucene.

CorsOptions

Defines options to control Cross-Origin Resource Sharing (CORS) for an index.

CustomAnalyzer

Allows you to take control over the process of converting text into indexable/searchable tokens. It's a user-defined configuration consisting of a single predefined tokenizer and one or more filters. The tokenizer is responsible for breaking text into tokens, and the filters for modifying tokens emitted by the tokenizer.

DictionaryDecompounderTokenFilter

Decomposes compound words found in many Germanic languages. This token filter is implemented using Apache Lucene.

DistanceScoringFunction

Defines a function that boosts scores based on distance from a geographic location.

DistanceScoringParameters

Provides parameter values to a distance scoring function.

EdgeNGramTokenFilter

Generates n-grams of the given size(s) starting from the front or the back of an input token. This token filter is implemented using Apache Lucene.

EdgeNGramTokenFilterSide

Specifies which side of the input an n-gram should be generated from.

EdgeNGramTokenFilterV2

Generates n-grams of the given size(s) starting from the front or the back of an input token. This token filter is implemented using Apache Lucene.

EdgeNGramTokenizer

Tokenizes the input from an edge into n-grams of the given size(s). This tokenizer is implemented using Apache Lucene.

ElisionTokenFilter

Removes elisions. For example, "l'avion" (the plane) will be converted to "avion" (plane). This token filter is implemented using Apache Lucene.

ExhaustiveKnnParameters

Contains the parameters specific to exhaustive KNN algorithm.

ExhaustiveKnnVectorSearchAlgorithmConfiguration

Contains configuration options specific to the exhaustive KNN algorithm used during querying, which will perform brute-force search across the entire vector index.

FreshnessScoringFunction

Defines a function that boosts scores based on the value of a date-time field.

FreshnessScoringParameters

Provides parameter values to a freshness scoring function.

HnswParameters

Contains the parameters specific to the HNSW algorithm.

HnswVectorSearchAlgorithmConfiguration

Contains configuration options specific to the HNSW approximate nearest neighbors algorithm used during indexing and querying. The HNSW algorithm offers a tunable trade-off between search speed and accuracy.

KeepTokenFilter

A token filter that only keeps tokens with text contained in a specified list of words. This token filter is implemented using Apache Lucene.

KeywordMarkerTokenFilter

Marks terms as keywords. This token filter is implemented using Apache Lucene.

KeywordTokenizer

Emits the entire input as a single token. This tokenizer is implemented using Apache Lucene.

KeywordTokenizerV2

Emits the entire input as a single token. This tokenizer is implemented using Apache Lucene.

LengthTokenFilter

Removes words that are too long or too short. This token filter is implemented using Apache Lucene.

LexicalAnalyzerName

Defines the names of all text analyzers supported by the search engine.

LexicalTokenizerName

Defines the names of all tokenizers supported by the search engine.

LimitTokenFilter

Limits the number of tokens while indexing. This token filter is implemented using Apache Lucene.

LuceneStandardAnalyzer

Standard Apache Lucene analyzer; Composed of the standard tokenizer, lowercase filter and stop filter.

LuceneStandardTokenizer

Breaks text following the Unicode Text Segmentation rules. This tokenizer is implemented using Apache Lucene.

LuceneStandardTokenizerV2

Breaks text following the Unicode Text Segmentation rules. This tokenizer is implemented using Apache Lucene.

MagnitudeScoringFunction

Defines a function that boosts scores based on the magnitude of a numeric field.

MagnitudeScoringParameters

Provides parameter values to a magnitude scoring function.

MappingCharFilter

A character filter that applies mappings defined with the mappings option. Matching is greedy (longest pattern matching at a given point wins). Replacement is allowed to be the empty string. This character filter is implemented using Apache Lucene.

MicrosoftLanguageStemmingTokenizer

Divides text using language-specific rules and reduces words to their base forms.

MicrosoftLanguageTokenizer

Divides text using language-specific rules.

MicrosoftStemmingTokenizerLanguage

Lists the languages supported by the Microsoft language stemming tokenizer.

MicrosoftTokenizerLanguage

Lists the languages supported by the Microsoft language tokenizer.

NGramTokenFilter

Generates n-grams of the given size(s). This token filter is implemented using Apache Lucene.

NGramTokenFilterV2

Generates n-grams of the given size(s). This token filter is implemented using Apache Lucene.

NGramTokenizer

Tokenizes the input into n-grams of the given size(s). This tokenizer is implemented using Apache Lucene.

PathHierarchyTokenizerV2

Tokenizer for path-like hierarchies. This tokenizer is implemented using Apache Lucene.

PatternAnalyzer

Flexibly separates text into terms via a regular expression pattern. This analyzer is implemented using Apache Lucene.

PatternCaptureTokenFilter

Uses Java regexes to emit multiple tokens - one for each capture group in one or more patterns. This token filter is implemented using Apache Lucene.

PatternReplaceCharFilter

A character filter that replaces characters in the input string. It uses a regular expression to identify character sequences to preserve and a replacement pattern to identify characters to replace. For example, given the input text "aa bb aa bb", pattern "(aa)\s+(bb)", and replacement "$1#$2", the result would be "aa#bb aa#bb". This character filter is implemented using Apache Lucene.

PatternReplaceTokenFilter

A character filter that replaces characters in the input string. It uses a regular expression to identify character sequences to preserve and a replacement pattern to identify characters to replace. For example, given the input text "aa bb aa bb", pattern "(aa)\s+(bb)", and replacement "$1#$2", the result would be "aa#bb aa#bb". This token filter is implemented using Apache Lucene.

PatternTokenizer

Tokenizer that uses regex pattern matching to construct distinct tokens. This tokenizer is implemented using Apache Lucene.

PhoneticEncoder

Identifies the type of phonetic encoder to use with a PhoneticTokenFilter.

PhoneticTokenFilter

Create tokens for phonetic matches. This token filter is implemented using Apache Lucene.

PrioritizedFields

Describes the title, content, and keywords fields to be used for semantic ranking, captions, highlights, and answers.

RegexFlags

Defines flags that can be combined to control how regular expressions are used in the pattern analyzer and pattern tokenizer.

ScoringFunctionAggregation

Defines the aggregation function used to combine the results of all the scoring functions in a scoring profile.

ScoringFunctionInterpolation

Defines the function used to interpolate score boosting across a range of documents.

ScoringProfile

Defines parameters for a search index that influence scoring in search queries.

SearchError

Describes an error condition for the API.

SearchField

Represents a field in an index definition, which describes the name, data type, and search behavior of a field.

SearchFieldDataType

Defines the data type of a field in a search index.

SearchIndex

Represents a search index definition, which describes the fields and search behavior of an index.

SearchResourceEncryptionKey

A customer-managed encryption key in Azure Key Vault. Keys that you create and manage can be used to encrypt or decrypt data-at-rest on your search service, such as indexes and synonym maps.

SemanticConfiguration

Defines a specific configuration to be used in the context of semantic capabilities.

SemanticField

A field that is used as part of the semantic configuration.

SemanticSettings

Defines parameters for a search index that influence semantic capabilities.

ShingleTokenFilter

Creates combinations of tokens as a single token. This token filter is implemented using Apache Lucene.

SnowballTokenFilter

A filter that stems words using a Snowball-generated stemmer. This token filter is implemented using Apache Lucene.

SnowballTokenFilterLanguage

The language to use for a Snowball token filter.

StemmerOverrideTokenFilter

Provides the ability to override other stemming filters with custom dictionary-based stemming. Any dictionary-stemmed terms will be marked as keywords so that they will not be stemmed with stemmers down the chain. Must be placed before any stemming filters. This token filter is implemented using Apache Lucene.

StemmerTokenFilter

Language specific stemming filter. This token filter is implemented using Apache Lucene.

StemmerTokenFilterLanguage

The language to use for a stemmer token filter.

StopAnalyzer

Divides text at non-letters; Applies the lowercase and stopword token filters. This analyzer is implemented using Apache Lucene.

StopwordsList

Identifies a predefined list of language-specific stopwords.

StopwordsTokenFilter

Removes stop words from a token stream. This token filter is implemented using Apache Lucene.

Suggester

Defines how the Suggest API should apply to a group of fields in the index.

SuggesterSearchMode

A value indicating the capabilities of the suggester.

SynonymTokenFilter

Matches single or multi-word synonyms in a token stream. This token filter is implemented using Apache Lucene.

TagScoringFunction

Defines a function that boosts scores of documents with string values matching a given list of tags.

TagScoringParameters

Provides parameter values to a tag scoring function.

TextWeights

Defines weights on index fields for which matches should boost scoring in search queries.

TokenCharacterKind

Represents classes of characters on which a token filter can operate.

TokenFilterName

Defines the names of all token filters supported by the search engine.

TruncateTokenFilter

Truncates the terms to a specific length. This token filter is implemented using Apache Lucene.

UaxUrlEmailTokenizer

Tokenizes urls and emails as one token. This tokenizer is implemented using Apache Lucene.

UniqueTokenFilter

Filters out tokens with same text as the previous token. This token filter is implemented using Apache Lucene.

VectorSearch

Contains configuration options related to vector search.

VectorSearchAlgorithmKind

The algorithm used for indexing and querying.

VectorSearchAlgorithmMetric

The similarity metric to use for vector comparisons.

VectorSearchProfile

Defines a combination of configurations to use with vector search.

WordDelimiterTokenFilter

Splits words into subwords and performs optional transformations on subword groups. This token filter is implemented using Apache Lucene.

AsciiFoldingTokenFilter

Converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if such equivalents exist. This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.AsciiFoldingTokenFilter

A URI fragment specifying the type of token filter.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

preserveOriginal

boolean

False

A value indicating whether the original token will be kept. Default is false.

AzureActiveDirectoryApplicationCredentials

Credentials of a registered application created for your search service, used for authenticated access to the encryption keys stored in Azure Key Vault.

Name Type Description
applicationId

string

An AAD Application ID that was granted the required access permissions to the Azure Key Vault that is to be used when encrypting your data at rest. The Application ID should not be confused with the Object ID for your AAD Application.

applicationSecret

string

The authentication key of the specified AAD application.

BM25Similarity

Ranking function based on the Okapi BM25 similarity algorithm. BM25 is a TF-IDF-like algorithm that includes length normalization (controlled by the 'b' parameter) as well as term frequency saturation (controlled by the 'k1' parameter).

Name Type Description
@odata.type string:

#Microsoft.Azure.Search.BM25Similarity

b

number

This property controls how the length of a document affects the relevance score. By default, a value of 0.75 is used. A value of 0.0 means no length normalization is applied, while a value of 1.0 means the score is fully normalized by the length of the document.

k1

number

This property controls the scaling function between the term frequency of each matching terms and the final relevance score of a document-query pair. By default, a value of 1.2 is used. A value of 0.0 means the score does not scale with an increase in term frequency.

CharFilterName

Defines the names of all character filters supported by the search engine.

Name Type Description
html_strip

string

A character filter that attempts to strip out HTML constructs. See https://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/charfilter/HTMLStripCharFilter.html

CjkBigramTokenFilter

Forms bigrams of CJK terms that are generated from the standard tokenizer. This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.CjkBigramTokenFilter

A URI fragment specifying the type of token filter.

ignoreScripts

CjkBigramTokenFilterScripts[]

The scripts to ignore.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

outputUnigrams

boolean

False

A value indicating whether to output both unigrams and bigrams (if true), or just bigrams (if false). Default is false.

CjkBigramTokenFilterScripts

Scripts that can be ignored by CjkBigramTokenFilter.

Name Type Description
han

string

Ignore Han script when forming bigrams of CJK terms.

hangul

string

Ignore Hangul script when forming bigrams of CJK terms.

hiragana

string

Ignore Hiragana script when forming bigrams of CJK terms.

katakana

string

Ignore Katakana script when forming bigrams of CJK terms.

ClassicSimilarity

Legacy similarity algorithm which uses the Lucene TFIDFSimilarity implementation of TF-IDF. This variation of TF-IDF introduces static document length normalization as well as coordinating factors that penalize documents that only partially match the searched queries.

Name Type Description
@odata.type string:

#Microsoft.Azure.Search.ClassicSimilarity

ClassicTokenizer

Grammar-based tokenizer that is suitable for processing most European-language documents. This tokenizer is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.ClassicTokenizer

A URI fragment specifying the type of tokenizer.

maxTokenLength

integer

255

The maximum token length. Default is 255. Tokens longer than the maximum length are split. The maximum token length that can be used is 300 characters.

name

string

The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

CommonGramTokenFilter

Construct bigrams for frequently occurring terms while indexing. Single terms are still indexed too, with bigrams overlaid. This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.CommonGramTokenFilter

A URI fragment specifying the type of token filter.

commonWords

string[]

The set of common words.

ignoreCase

boolean

False

A value indicating whether common words matching will be case insensitive. Default is false.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

queryMode

boolean

False

A value that indicates whether the token filter is in query mode. When in query mode, the token filter generates bigrams and then removes common words and single terms followed by a common word. Default is false.

CorsOptions

Defines options to control Cross-Origin Resource Sharing (CORS) for an index.

Name Type Description
allowedOrigins

string[]

The list of origins from which JavaScript code will be granted access to your index. Can contain a list of hosts of the form {protocol}://{fully-qualified-domain-name}[:{port#}], or a single * to allow all origins (not recommended).

maxAgeInSeconds

integer

The duration for which browsers should cache CORS preflight responses. Defaults to 5 minutes.

CustomAnalyzer

Allows you to take control over the process of converting text into indexable/searchable tokens. It's a user-defined configuration consisting of a single predefined tokenizer and one or more filters. The tokenizer is responsible for breaking text into tokens, and the filters for modifying tokens emitted by the tokenizer.

Name Type Description
@odata.type string:

#Microsoft.Azure.Search.CustomAnalyzer

A URI fragment specifying the type of analyzer.

charFilters

CharFilterName[]

A list of character filters used to prepare input text before it is processed by the tokenizer. For instance, they can replace certain characters or symbols. The filters are run in the order in which they are listed.

name

string

The name of the analyzer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

tokenFilters

TokenFilterName[]

A list of token filters used to filter out or modify the tokens generated by a tokenizer. For example, you can specify a lowercase filter that converts all characters to lowercase. The filters are run in the order in which they are listed.

tokenizer

LexicalTokenizerName

The name of the tokenizer to use to divide continuous text into a sequence of tokens, such as breaking a sentence into words.

DictionaryDecompounderTokenFilter

Decomposes compound words found in many Germanic languages. This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.DictionaryDecompounderTokenFilter

A URI fragment specifying the type of token filter.

maxSubwordSize

integer

15

The maximum subword size. Only subwords shorter than this are outputted. Default is 15. Maximum is 300.

minSubwordSize

integer

2

The minimum subword size. Only subwords longer than this are outputted. Default is 2. Maximum is 300.

minWordSize

integer

5

The minimum word size. Only words longer than this get processed. Default is 5. Maximum is 300.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

onlyLongestMatch

boolean

False

A value indicating whether to add only the longest matching subword to the output. Default is false.

wordList

string[]

The list of words to match against.

DistanceScoringFunction

Defines a function that boosts scores based on distance from a geographic location.

Name Type Description
boost

number

A multiplier for the raw score. Must be a positive number not equal to 1.0.

distance

DistanceScoringParameters

Parameter values for the distance scoring function.

fieldName

string

The name of the field used as input to the scoring function.

interpolation

ScoringFunctionInterpolation

A value indicating how boosting will be interpolated across document scores; defaults to "Linear".

type string:

distance

Indicates the type of function to use. Valid values include magnitude, freshness, distance, and tag. The function type must be lower case.

DistanceScoringParameters

Provides parameter values to a distance scoring function.

Name Type Description
boostingDistance

number

The distance in kilometers from the reference location where the boosting range ends.

referencePointParameter

string

The name of the parameter passed in search queries to specify the reference location.

EdgeNGramTokenFilter

Generates n-grams of the given size(s) starting from the front or the back of an input token. This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.EdgeNGramTokenFilter

A URI fragment specifying the type of token filter.

maxGram

integer

2

The maximum n-gram length. Default is 2.

minGram

integer

1

The minimum n-gram length. Default is 1. Must be less than the value of maxGram.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

side

EdgeNGramTokenFilterSide

front

Specifies which side of the input the n-gram should be generated from. Default is "front".

EdgeNGramTokenFilterSide

Specifies which side of the input an n-gram should be generated from.

Name Type Description
back

string

Specifies that the n-gram should be generated from the back of the input.

front

string

Specifies that the n-gram should be generated from the front of the input.

EdgeNGramTokenFilterV2

Generates n-grams of the given size(s) starting from the front or the back of an input token. This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.EdgeNGramTokenFilterV2

A URI fragment specifying the type of token filter.

maxGram

integer

2

The maximum n-gram length. Default is 2. Maximum is 300.

minGram

integer

1

The minimum n-gram length. Default is 1. Maximum is 300. Must be less than the value of maxGram.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

side

EdgeNGramTokenFilterSide

front

Specifies which side of the input the n-gram should be generated from. Default is "front".

EdgeNGramTokenizer

Tokenizes the input from an edge into n-grams of the given size(s). This tokenizer is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.EdgeNGramTokenizer

A URI fragment specifying the type of tokenizer.

maxGram

integer

2

The maximum n-gram length. Default is 2. Maximum is 300.

minGram

integer

1

The minimum n-gram length. Default is 1. Maximum is 300. Must be less than the value of maxGram.

name

string

The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

tokenChars

TokenCharacterKind[]

Character classes to keep in the tokens.

ElisionTokenFilter

Removes elisions. For example, "l'avion" (the plane) will be converted to "avion" (plane). This token filter is implemented using Apache Lucene.

Name Type Description
@odata.type string:

#Microsoft.Azure.Search.ElisionTokenFilter

A URI fragment specifying the type of token filter.

articles

string[]

The set of articles to remove.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

ExhaustiveKnnParameters

Contains the parameters specific to exhaustive KNN algorithm.

Name Type Description
metric

VectorSearchAlgorithmMetric

The similarity metric to use for vector comparisons.

ExhaustiveKnnVectorSearchAlgorithmConfiguration

Contains configuration options specific to the exhaustive KNN algorithm used during querying, which will perform brute-force search across the entire vector index.

Name Type Description
exhaustiveKnnParameters

ExhaustiveKnnParameters

Contains the parameters specific to exhaustive KNN algorithm.

kind string:

exhaustiveKnn

The name of the kind of algorithm being configured for use with vector search.

name

string

The name to associate with this particular configuration.

FreshnessScoringFunction

Defines a function that boosts scores based on the value of a date-time field.

Name Type Description
boost

number

A multiplier for the raw score. Must be a positive number not equal to 1.0.

fieldName

string

The name of the field used as input to the scoring function.

freshness

FreshnessScoringParameters

Parameter values for the freshness scoring function.

interpolation

ScoringFunctionInterpolation

A value indicating how boosting will be interpolated across document scores; defaults to "Linear".

type string:

freshness

Indicates the type of function to use. Valid values include magnitude, freshness, distance, and tag. The function type must be lower case.

FreshnessScoringParameters

Provides parameter values to a freshness scoring function.

Name Type Description
boostingDuration

string

The expiration period after which boosting will stop for a particular document.

HnswParameters

Contains the parameters specific to the HNSW algorithm.

Name Type Default Value Description
efConstruction

integer

400

The size of the dynamic list containing the nearest neighbors, which is used during index time. Increasing this parameter may improve index quality, at the expense of increased indexing time. At a certain point, increasing this parameter leads to diminishing returns.

efSearch

integer

500

The size of the dynamic list containing the nearest neighbors, which is used during search time. Increasing this parameter may improve search results, at the expense of slower search. At a certain point, increasing this parameter leads to diminishing returns.

m

integer

4

The number of bi-directional links created for every new element during construction. Increasing this parameter value may improve recall and reduce retrieval times for datasets with high intrinsic dimensionality at the expense of increased memory consumption and longer indexing time.

metric

VectorSearchAlgorithmMetric

The similarity metric to use for vector comparisons.

HnswVectorSearchAlgorithmConfiguration

Contains configuration options specific to the HNSW approximate nearest neighbors algorithm used during indexing and querying. The HNSW algorithm offers a tunable trade-off between search speed and accuracy.

Name Type Description
hnswParameters

HnswParameters

Contains the parameters specific to HNSW algorithm.

kind string:

hnsw

The name of the kind of algorithm being configured for use with vector search.

name

string

The name to associate with this particular configuration.

KeepTokenFilter

A token filter that only keeps tokens with text contained in a specified list of words. This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.KeepTokenFilter

A URI fragment specifying the type of token filter.

keepWords

string[]

The list of words to keep.

keepWordsCase

boolean

False

A value indicating whether to lower case all words first. Default is false.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

KeywordMarkerTokenFilter

Marks terms as keywords. This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.KeywordMarkerTokenFilter

A URI fragment specifying the type of token filter.

ignoreCase

boolean

False

A value indicating whether to ignore case. If true, all words are converted to lower case first. Default is false.

keywords

string[]

A list of words to mark as keywords.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

KeywordTokenizer

Emits the entire input as a single token. This tokenizer is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.KeywordTokenizer

A URI fragment specifying the type of tokenizer.

bufferSize

integer

256

The read buffer size in bytes. Default is 256.

name

string

The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

KeywordTokenizerV2

Emits the entire input as a single token. This tokenizer is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.KeywordTokenizerV2

A URI fragment specifying the type of tokenizer.

maxTokenLength

integer

256

The maximum token length. Default is 256. Tokens longer than the maximum length are split. The maximum token length that can be used is 300 characters.

name

string

The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

LengthTokenFilter

Removes words that are too long or too short. This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.LengthTokenFilter

A URI fragment specifying the type of token filter.

max

integer

300

The maximum length in characters. Default and maximum is 300.

min

integer

0

The minimum length in characters. Default is 0. Maximum is 300. Must be less than the value of max.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

LexicalAnalyzerName

Defines the names of all text analyzers supported by the search engine.

Name Type Description
ar.lucene

string

Lucene analyzer for Arabic.

ar.microsoft

string

Microsoft analyzer for Arabic.

bg.lucene

string

Lucene analyzer for Bulgarian.

bg.microsoft

string

Microsoft analyzer for Bulgarian.

bn.microsoft

string

Microsoft analyzer for Bangla.

ca.lucene

string

Lucene analyzer for Catalan.

ca.microsoft

string

Microsoft analyzer for Catalan.

cs.lucene

string

Lucene analyzer for Czech.

cs.microsoft

string

Microsoft analyzer for Czech.

da.lucene

string

Lucene analyzer for Danish.

da.microsoft

string

Microsoft analyzer for Danish.

de.lucene

string

Lucene analyzer for German.

de.microsoft

string

Microsoft analyzer for German.

el.lucene

string

Lucene analyzer for Greek.

el.microsoft

string

Microsoft analyzer for Greek.

en.lucene

string

Lucene analyzer for English.

en.microsoft

string

Microsoft analyzer for English.

es.lucene

string

Lucene analyzer for Spanish.

es.microsoft

string

Microsoft analyzer for Spanish.

et.microsoft

string

Microsoft analyzer for Estonian.

eu.lucene

string

Lucene analyzer for Basque.

fa.lucene

string

Lucene analyzer for Persian.

fi.lucene

string

Lucene analyzer for Finnish.

fi.microsoft

string

Microsoft analyzer for Finnish.

fr.lucene

string

Lucene analyzer for French.

fr.microsoft

string

Microsoft analyzer for French.

ga.lucene

string

Lucene analyzer for Irish.

gl.lucene

string

Lucene analyzer for Galician.

gu.microsoft

string

Microsoft analyzer for Gujarati.

he.microsoft

string

Microsoft analyzer for Hebrew.

hi.lucene

string

Lucene analyzer for Hindi.

hi.microsoft

string

Microsoft analyzer for Hindi.

hr.microsoft

string

Microsoft analyzer for Croatian.

hu.lucene

string

Lucene analyzer for Hungarian.

hu.microsoft

string

Microsoft analyzer for Hungarian.

hy.lucene

string

Lucene analyzer for Armenian.

id.lucene

string

Lucene analyzer for Indonesian.

id.microsoft

string

Microsoft analyzer for Indonesian (Bahasa).

is.microsoft

string

Microsoft analyzer for Icelandic.

it.lucene

string

Lucene analyzer for Italian.

it.microsoft

string

Microsoft analyzer for Italian.

ja.lucene

string

Lucene analyzer for Japanese.

ja.microsoft

string

Microsoft analyzer for Japanese.

keyword

string

Treats the entire content of a field as a single token. This is useful for data like zip codes, ids, and some product names. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/KeywordAnalyzer.html

kn.microsoft

string

Microsoft analyzer for Kannada.

ko.lucene

string

Lucene analyzer for Korean.

ko.microsoft

string

Microsoft analyzer for Korean.

lt.microsoft

string

Microsoft analyzer for Lithuanian.

lv.lucene

string

Lucene analyzer for Latvian.

lv.microsoft

string

Microsoft analyzer for Latvian.

ml.microsoft

string

Microsoft analyzer for Malayalam.

mr.microsoft

string

Microsoft analyzer for Marathi.

ms.microsoft

string

Microsoft analyzer for Malay (Latin).

nb.microsoft

string

Microsoft analyzer for Norwegian (Bokmål).

nl.lucene

string

Lucene analyzer for Dutch.

nl.microsoft

string

Microsoft analyzer for Dutch.

no.lucene

string

Lucene analyzer for Norwegian.

pa.microsoft

string

Microsoft analyzer for Punjabi.

pattern

string

Flexibly separates text into terms via a regular expression pattern. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/PatternAnalyzer.html

pl.lucene

string

Lucene analyzer for Polish.

pl.microsoft

string

Microsoft analyzer for Polish.

pt-BR.lucene

string

Lucene analyzer for Portuguese (Brazil).

pt-BR.microsoft

string

Microsoft analyzer for Portuguese (Brazil).

pt-PT.lucene

string

Lucene analyzer for Portuguese (Portugal).

pt-PT.microsoft

string

Microsoft analyzer for Portuguese (Portugal).

ro.lucene

string

Lucene analyzer for Romanian.

ro.microsoft

string

Microsoft analyzer for Romanian.

ru.lucene

string

Lucene analyzer for Russian.

ru.microsoft

string

Microsoft analyzer for Russian.

simple

string

Divides text at non-letters and converts them to lower case. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/SimpleAnalyzer.html

sk.microsoft

string

Microsoft analyzer for Slovak.

sl.microsoft

string

Microsoft analyzer for Slovenian.

sr-cyrillic.microsoft

string

Microsoft analyzer for Serbian (Cyrillic).

sr-latin.microsoft

string

Microsoft analyzer for Serbian (Latin).

standard.lucene

string

Standard Lucene analyzer.

standardasciifolding.lucene

string

Standard ASCII Folding Lucene analyzer. See https://docs.microsoft.com/rest/api/searchservice/Custom-analyzers-in-Azure-Search#Analyzers

stop

string

Divides text at non-letters; Applies the lowercase and stopword token filters. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/StopAnalyzer.html

sv.lucene

string

Lucene analyzer for Swedish.

sv.microsoft

string

Microsoft analyzer for Swedish.

ta.microsoft

string

Microsoft analyzer for Tamil.

te.microsoft

string

Microsoft analyzer for Telugu.

th.lucene

string

Lucene analyzer for Thai.

th.microsoft

string

Microsoft analyzer for Thai.

tr.lucene

string

Lucene analyzer for Turkish.

tr.microsoft

string

Microsoft analyzer for Turkish.

uk.microsoft

string

Microsoft analyzer for Ukrainian.

ur.microsoft

string

Microsoft analyzer for Urdu.

vi.microsoft

string

Microsoft analyzer for Vietnamese.

whitespace

string

An analyzer that uses the whitespace tokenizer. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/WhitespaceAnalyzer.html

zh-Hans.lucene

string

Lucene analyzer for Chinese (Simplified).

zh-Hans.microsoft

string

Microsoft analyzer for Chinese (Simplified).

zh-Hant.lucene

string

Lucene analyzer for Chinese (Traditional).

zh-Hant.microsoft

string

Microsoft analyzer for Chinese (Traditional).

LexicalTokenizerName

Defines the names of all tokenizers supported by the search engine.

Name Type Description
classic

string

Grammar-based tokenizer that is suitable for processing most European-language documents. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/ClassicTokenizer.html

edgeNGram

string

Tokenizes the input from an edge into n-grams of the given size(s). See https://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenizer.html

keyword_v2

string

Emits the entire input as a single token. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/KeywordTokenizer.html

letter

string

Divides text at non-letters. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/LetterTokenizer.html

lowercase

string

Divides text at non-letters and converts them to lower case. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/LowerCaseTokenizer.html

microsoft_language_stemming_tokenizer

string

Divides text using language-specific rules and reduces words to their base forms.

microsoft_language_tokenizer

string

Divides text using language-specific rules.

nGram

string

Tokenizes the input into n-grams of the given size(s). See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenizer.html

path_hierarchy_v2

string

Tokenizer for path-like hierarchies. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/path/PathHierarchyTokenizer.html

pattern

string

Tokenizer that uses regex pattern matching to construct distinct tokens. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/pattern/PatternTokenizer.html

standard_v2

string

Standard Lucene analyzer; Composed of the standard tokenizer, lowercase filter and stop filter. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizer.html

uax_url_email

string

Tokenizes urls and emails as one token. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizer.html

whitespace

string

Divides text at whitespace. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/WhitespaceTokenizer.html

LimitTokenFilter

Limits the number of tokens while indexing. This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.LimitTokenFilter

A URI fragment specifying the type of token filter.

consumeAllTokens

boolean

False

A value indicating whether all tokens from the input must be consumed even if maxTokenCount is reached. Default is false.

maxTokenCount

integer

1

The maximum number of tokens to produce. Default is 1.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

LuceneStandardAnalyzer

Standard Apache Lucene analyzer; Composed of the standard tokenizer, lowercase filter and stop filter.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.StandardAnalyzer

A URI fragment specifying the type of analyzer.

maxTokenLength

integer

255

The maximum token length. Default is 255. Tokens longer than the maximum length are split. The maximum token length that can be used is 300 characters.

name

string

The name of the analyzer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

stopwords

string[]

A list of stopwords.

LuceneStandardTokenizer

Breaks text following the Unicode Text Segmentation rules. This tokenizer is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.StandardTokenizer

A URI fragment specifying the type of tokenizer.

maxTokenLength

integer

255

The maximum token length. Default is 255. Tokens longer than the maximum length are split.

name

string

The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

LuceneStandardTokenizerV2

Breaks text following the Unicode Text Segmentation rules. This tokenizer is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.StandardTokenizerV2

A URI fragment specifying the type of tokenizer.

maxTokenLength

integer

255

The maximum token length. Default is 255. Tokens longer than the maximum length are split. The maximum token length that can be used is 300 characters.

name

string

The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

MagnitudeScoringFunction

Defines a function that boosts scores based on the magnitude of a numeric field.

Name Type Description
boost

number

A multiplier for the raw score. Must be a positive number not equal to 1.0.

fieldName

string

The name of the field used as input to the scoring function.

interpolation

ScoringFunctionInterpolation

A value indicating how boosting will be interpolated across document scores; defaults to "Linear".

magnitude

MagnitudeScoringParameters

Parameter values for the magnitude scoring function.

type string:

magnitude

Indicates the type of function to use. Valid values include magnitude, freshness, distance, and tag. The function type must be lower case.

MagnitudeScoringParameters

Provides parameter values to a magnitude scoring function.

Name Type Description
boostingRangeEnd

number

The field value at which boosting ends.

boostingRangeStart

number

The field value at which boosting starts.

constantBoostBeyondRange

boolean

A value indicating whether to apply a constant boost for field values beyond the range end value; default is false.

MappingCharFilter

A character filter that applies mappings defined with the mappings option. Matching is greedy (longest pattern matching at a given point wins). Replacement is allowed to be the empty string. This character filter is implemented using Apache Lucene.

Name Type Description
@odata.type string:

#Microsoft.Azure.Search.MappingCharFilter

A URI fragment specifying the type of char filter.

mappings

string[]

A list of mappings of the following format: "a=>b" (all occurrences of the character "a" will be replaced with character "b").

name

string

The name of the char filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

MicrosoftLanguageStemmingTokenizer

Divides text using language-specific rules and reduces words to their base forms.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer

A URI fragment specifying the type of tokenizer.

isSearchTokenizer

boolean

False

A value indicating how the tokenizer is used. Set to true if used as the search tokenizer, set to false if used as the indexing tokenizer. Default is false.

language

MicrosoftStemmingTokenizerLanguage

The language to use. The default is English.

maxTokenLength

integer

255

The maximum token length. Tokens longer than the maximum length are split. Maximum token length that can be used is 300 characters. Tokens longer than 300 characters are first split into tokens of length 300 and then each of those tokens is split based on the max token length set. Default is 255.

name

string

The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

MicrosoftLanguageTokenizer

Divides text using language-specific rules.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.MicrosoftLanguageTokenizer

A URI fragment specifying the type of tokenizer.

isSearchTokenizer

boolean

False

A value indicating how the tokenizer is used. Set to true if used as the search tokenizer, set to false if used as the indexing tokenizer. Default is false.

language

MicrosoftTokenizerLanguage

The language to use. The default is English.

maxTokenLength

integer

255

The maximum token length. Tokens longer than the maximum length are split. Maximum token length that can be used is 300 characters. Tokens longer than 300 characters are first split into tokens of length 300 and then each of those tokens is split based on the max token length set. Default is 255.

name

string

The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

MicrosoftStemmingTokenizerLanguage

Lists the languages supported by the Microsoft language stemming tokenizer.

Name Type Description
arabic

string

Selects the Microsoft stemming tokenizer for Arabic.

bangla

string

Selects the Microsoft stemming tokenizer for Bangla.

bulgarian

string

Selects the Microsoft stemming tokenizer for Bulgarian.

catalan

string

Selects the Microsoft stemming tokenizer for Catalan.

croatian

string

Selects the Microsoft stemming tokenizer for Croatian.

czech

string

Selects the Microsoft stemming tokenizer for Czech.

danish

string

Selects the Microsoft stemming tokenizer for Danish.

dutch

string

Selects the Microsoft stemming tokenizer for Dutch.

english

string

Selects the Microsoft stemming tokenizer for English.

estonian

string

Selects the Microsoft stemming tokenizer for Estonian.

finnish

string

Selects the Microsoft stemming tokenizer for Finnish.

french

string

Selects the Microsoft stemming tokenizer for French.

german

string

Selects the Microsoft stemming tokenizer for German.

greek

string

Selects the Microsoft stemming tokenizer for Greek.

gujarati

string

Selects the Microsoft stemming tokenizer for Gujarati.

hebrew

string

Selects the Microsoft stemming tokenizer for Hebrew.

hindi

string

Selects the Microsoft stemming tokenizer for Hindi.

hungarian

string

Selects the Microsoft stemming tokenizer for Hungarian.

icelandic

string

Selects the Microsoft stemming tokenizer for Icelandic.

indonesian

string

Selects the Microsoft stemming tokenizer for Indonesian.

italian

string

Selects the Microsoft stemming tokenizer for Italian.

kannada

string

Selects the Microsoft stemming tokenizer for Kannada.

latvian

string

Selects the Microsoft stemming tokenizer for Latvian.

lithuanian

string

Selects the Microsoft stemming tokenizer for Lithuanian.

malay

string

Selects the Microsoft stemming tokenizer for Malay.

malayalam

string

Selects the Microsoft stemming tokenizer for Malayalam.

marathi

string

Selects the Microsoft stemming tokenizer for Marathi.

norwegianBokmaal

string

Selects the Microsoft stemming tokenizer for Norwegian (Bokmål).

polish

string

Selects the Microsoft stemming tokenizer for Polish.

portuguese

string

Selects the Microsoft stemming tokenizer for Portuguese.

portugueseBrazilian

string

Selects the Microsoft stemming tokenizer for Portuguese (Brazil).

punjabi

string

Selects the Microsoft stemming tokenizer for Punjabi.

romanian

string

Selects the Microsoft stemming tokenizer for Romanian.

russian

string

Selects the Microsoft stemming tokenizer for Russian.

serbianCyrillic

string

Selects the Microsoft stemming tokenizer for Serbian (Cyrillic).

serbianLatin

string

Selects the Microsoft stemming tokenizer for Serbian (Latin).

slovak

string

Selects the Microsoft stemming tokenizer for Slovak.

slovenian

string

Selects the Microsoft stemming tokenizer for Slovenian.

spanish

string

Selects the Microsoft stemming tokenizer for Spanish.

swedish

string

Selects the Microsoft stemming tokenizer for Swedish.

tamil

string

Selects the Microsoft stemming tokenizer for Tamil.

telugu

string

Selects the Microsoft stemming tokenizer for Telugu.

turkish

string

Selects the Microsoft stemming tokenizer for Turkish.

ukrainian

string

Selects the Microsoft stemming tokenizer for Ukrainian.

urdu

string

Selects the Microsoft stemming tokenizer for Urdu.

MicrosoftTokenizerLanguage

Lists the languages supported by the Microsoft language tokenizer.

Name Type Description
bangla

string

Selects the Microsoft tokenizer for Bangla.

bulgarian

string

Selects the Microsoft tokenizer for Bulgarian.

catalan

string

Selects the Microsoft tokenizer for Catalan.

chineseSimplified

string

Selects the Microsoft tokenizer for Chinese (Simplified).

chineseTraditional

string

Selects the Microsoft tokenizer for Chinese (Traditional).

croatian

string

Selects the Microsoft tokenizer for Croatian.

czech

string

Selects the Microsoft tokenizer for Czech.

danish

string

Selects the Microsoft tokenizer for Danish.

dutch

string

Selects the Microsoft tokenizer for Dutch.

english

string

Selects the Microsoft tokenizer for English.

french

string

Selects the Microsoft tokenizer for French.

german

string

Selects the Microsoft tokenizer for German.

greek

string

Selects the Microsoft tokenizer for Greek.

gujarati

string

Selects the Microsoft tokenizer for Gujarati.

hindi

string

Selects the Microsoft tokenizer for Hindi.

icelandic

string

Selects the Microsoft tokenizer for Icelandic.

indonesian

string

Selects the Microsoft tokenizer for Indonesian.

italian

string

Selects the Microsoft tokenizer for Italian.

japanese

string

Selects the Microsoft tokenizer for Japanese.

kannada

string

Selects the Microsoft tokenizer for Kannada.

korean

string

Selects the Microsoft tokenizer for Korean.

malay

string

Selects the Microsoft tokenizer for Malay.

malayalam

string

Selects the Microsoft tokenizer for Malayalam.

marathi

string

Selects the Microsoft tokenizer for Marathi.

norwegianBokmaal

string

Selects the Microsoft tokenizer for Norwegian (Bokmål).

polish

string

Selects the Microsoft tokenizer for Polish.

portuguese

string

Selects the Microsoft tokenizer for Portuguese.

portugueseBrazilian

string

Selects the Microsoft tokenizer for Portuguese (Brazil).

punjabi

string

Selects the Microsoft tokenizer for Punjabi.

romanian

string

Selects the Microsoft tokenizer for Romanian.

russian

string

Selects the Microsoft tokenizer for Russian.

serbianCyrillic

string

Selects the Microsoft tokenizer for Serbian (Cyrillic).

serbianLatin

string

Selects the Microsoft tokenizer for Serbian (Latin).

slovenian

string

Selects the Microsoft tokenizer for Slovenian.

spanish

string

Selects the Microsoft tokenizer for Spanish.

swedish

string

Selects the Microsoft tokenizer for Swedish.

tamil

string

Selects the Microsoft tokenizer for Tamil.

telugu

string

Selects the Microsoft tokenizer for Telugu.

thai

string

Selects the Microsoft tokenizer for Thai.

ukrainian

string

Selects the Microsoft tokenizer for Ukrainian.

urdu

string

Selects the Microsoft tokenizer for Urdu.

vietnamese

string

Selects the Microsoft tokenizer for Vietnamese.

NGramTokenFilter

Generates n-grams of the given size(s). This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.NGramTokenFilter

A URI fragment specifying the type of token filter.

maxGram

integer

2

The maximum n-gram length. Default is 2.

minGram

integer

1

The minimum n-gram length. Default is 1. Must be less than the value of maxGram.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

NGramTokenFilterV2

Generates n-grams of the given size(s). This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.NGramTokenFilterV2

A URI fragment specifying the type of token filter.

maxGram

integer

2

The maximum n-gram length. Default is 2. Maximum is 300.

minGram

integer

1

The minimum n-gram length. Default is 1. Maximum is 300. Must be less than the value of maxGram.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

NGramTokenizer

Tokenizes the input into n-grams of the given size(s). This tokenizer is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.NGramTokenizer

A URI fragment specifying the type of tokenizer.

maxGram

integer

2

The maximum n-gram length. Default is 2. Maximum is 300.

minGram

integer

1

The minimum n-gram length. Default is 1. Maximum is 300. Must be less than the value of maxGram.

name

string

The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

tokenChars

TokenCharacterKind[]

Character classes to keep in the tokens.

PathHierarchyTokenizerV2

Tokenizer for path-like hierarchies. This tokenizer is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.PathHierarchyTokenizerV2

A URI fragment specifying the type of tokenizer.

delimiter

string

/

The delimiter character to use. Default is "/".

maxTokenLength

integer

300

The maximum token length. Default and maximum is 300.

name

string

The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

replacement

string

/

A value that, if set, replaces the delimiter character. Default is "/".

reverse

boolean

False

A value indicating whether to generate tokens in reverse order. Default is false.

skip

integer

0

The number of initial tokens to skip. Default is 0.

PatternAnalyzer

Flexibly separates text into terms via a regular expression pattern. This analyzer is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.PatternAnalyzer

A URI fragment specifying the type of analyzer.

flags

RegexFlags

Regular expression flags.

lowercase

boolean

True

A value indicating whether terms should be lower-cased. Default is true.

name

string

The name of the analyzer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

pattern

string

\W+

A regular expression pattern to match token separators. Default is an expression that matches one or more non-word characters.

stopwords

string[]

A list of stopwords.

PatternCaptureTokenFilter

Uses Java regexes to emit multiple tokens - one for each capture group in one or more patterns. This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.PatternCaptureTokenFilter

A URI fragment specifying the type of token filter.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

patterns

string[]

A list of patterns to match against each token.

preserveOriginal

boolean

True

A value indicating whether to return the original token even if one of the patterns matches. Default is true.

PatternReplaceCharFilter

A character filter that replaces characters in the input string. It uses a regular expression to identify character sequences to preserve and a replacement pattern to identify characters to replace. For example, given the input text "aa bb aa bb", pattern "(aa)\s+(bb)", and replacement "$1#$2", the result would be "aa#bb aa#bb". This character filter is implemented using Apache Lucene.

Name Type Description
@odata.type string:

#Microsoft.Azure.Search.PatternReplaceCharFilter

A URI fragment specifying the type of char filter.

name

string

The name of the char filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

pattern

string

A regular expression pattern.

replacement

string

The replacement text.

PatternReplaceTokenFilter

A character filter that replaces characters in the input string. It uses a regular expression to identify character sequences to preserve and a replacement pattern to identify characters to replace. For example, given the input text "aa bb aa bb", pattern "(aa)\s+(bb)", and replacement "$1#$2", the result would be "aa#bb aa#bb". This token filter is implemented using Apache Lucene.

Name Type Description
@odata.type string:

#Microsoft.Azure.Search.PatternReplaceTokenFilter

A URI fragment specifying the type of token filter.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

pattern

string

A regular expression pattern.

replacement

string

The replacement text.

PatternTokenizer

Tokenizer that uses regex pattern matching to construct distinct tokens. This tokenizer is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.PatternTokenizer

A URI fragment specifying the type of tokenizer.

flags

RegexFlags

Regular expression flags.

group

integer

-1

The zero-based ordinal of the matching group in the regular expression pattern to extract into tokens. Use -1 if you want to use the entire pattern to split the input into tokens, irrespective of matching groups. Default is -1.

name

string

The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

pattern

string

\W+

A regular expression pattern to match token separators. Default is an expression that matches one or more non-word characters.

PhoneticEncoder

Identifies the type of phonetic encoder to use with a PhoneticTokenFilter.

Name Type Description
beiderMorse

string

Encodes a token into a Beider-Morse value.

caverphone1

string

Encodes a token into a Caverphone 1.0 value.

caverphone2

string

Encodes a token into a Caverphone 2.0 value.

cologne

string

Encodes a token into a Cologne Phonetic value.

doubleMetaphone

string

Encodes a token into a double metaphone value.

haasePhonetik

string

Encodes a token using the Haase refinement of the Kölner Phonetik algorithm.

koelnerPhonetik

string

Encodes a token using the Kölner Phonetik algorithm.

metaphone

string

Encodes a token into a Metaphone value.

nysiis

string

Encodes a token into a NYSIIS value.

refinedSoundex

string

Encodes a token into a Refined Soundex value.

soundex

string

Encodes a token into a Soundex value.

PhoneticTokenFilter

Create tokens for phonetic matches. This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.PhoneticTokenFilter

A URI fragment specifying the type of token filter.

encoder

PhoneticEncoder

metaphone

The phonetic encoder to use. Default is "metaphone".

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

replace

boolean

True

A value indicating whether encoded tokens should replace original tokens. If false, encoded tokens are added as synonyms. Default is true.

PrioritizedFields

Describes the title, content, and keywords fields to be used for semantic ranking, captions, highlights, and answers.

Name Type Description
prioritizedContentFields

SemanticField[]

Defines the content fields to be used for semantic ranking, captions, highlights, and answers. For the best result, the selected fields should contain text in natural language form. The order of the fields in the array represents their priority. Fields with lower priority may get truncated if the content is long.

prioritizedKeywordsFields

SemanticField[]

Defines the keyword fields to be used for semantic ranking, captions, highlights, and answers. For the best result, the selected fields should contain a list of keywords. The order of the fields in the array represents their priority. Fields with lower priority may get truncated if the content is long.

titleField

SemanticField

Defines the title field to be used for semantic ranking, captions, highlights, and answers. If you don't have a title field in your index, leave this blank.

RegexFlags

Defines flags that can be combined to control how regular expressions are used in the pattern analyzer and pattern tokenizer.

Name Type Description
CANON_EQ

string

Enables canonical equivalence.

CASE_INSENSITIVE

string

Enables case-insensitive matching.

COMMENTS

string

Permits whitespace and comments in the pattern.

DOTALL

string

Enables dotall mode.

LITERAL

string

Enables literal parsing of the pattern.

MULTILINE

string

Enables multiline mode.

UNICODE_CASE

string

Enables Unicode-aware case folding.

UNIX_LINES

string

Enables Unix lines mode.

ScoringFunctionAggregation

Defines the aggregation function used to combine the results of all the scoring functions in a scoring profile.

Name Type Description
average

string

Boost scores by the average of all scoring function results.

firstMatching

string

Boost scores using the first applicable scoring function in the scoring profile.

maximum

string

Boost scores by the maximum of all scoring function results.

minimum

string

Boost scores by the minimum of all scoring function results.

sum

string

Boost scores by the sum of all scoring function results.

ScoringFunctionInterpolation

Defines the function used to interpolate score boosting across a range of documents.

Name Type Description
constant

string

Boosts scores by a constant factor.

linear

string

Boosts scores by a linearly decreasing amount. This is the default interpolation for scoring functions.

logarithmic

string

Boosts scores by an amount that decreases logarithmically. Boosts decrease quickly for higher scores, and more slowly as the scores decrease. This interpolation option is not allowed in tag scoring functions.

quadratic

string

Boosts scores by an amount that decreases quadratically. Boosts decrease slowly for higher scores, and more quickly as the scores decrease. This interpolation option is not allowed in tag scoring functions.

ScoringProfile

Defines parameters for a search index that influence scoring in search queries.

Name Type Description
functionAggregation

ScoringFunctionAggregation

A value indicating how the results of individual scoring functions should be combined. Defaults to "Sum". Ignored if there are no scoring functions.

functions ScoringFunction[]:

The collection of functions that influence the scoring of documents.

name

string

The name of the scoring profile.

text

TextWeights

Parameters that boost scoring based on text matches in certain index fields.

SearchError

Describes an error condition for the API.

Name Type Description
code

string

One of a server-defined set of error codes.

details

SearchError[]

An array of details about specific errors that led to this reported error.

message

string

A human-readable representation of the error.

SearchField

Represents a field in an index definition, which describes the name, data type, and search behavior of a field.

Name Type Description
analyzer

LexicalAnalyzerName

The name of the analyzer to use for the field. This option can be used only with searchable fields and it can't be set together with either searchAnalyzer or indexAnalyzer. Once the analyzer is chosen, it cannot be changed for the field. Must be null for complex fields.

dimensions

integer

The dimensionality of the vector field.

facetable

boolean

A value indicating whether to enable the field to be referenced in facet queries. Typically used in a presentation of search results that includes hit count by category (for example, search for digital cameras and see hits by brand, by megapixels, by price, and so on). This property must be null for complex fields. Fields of type Edm.GeographyPoint or Collection(Edm.GeographyPoint) cannot be facetable. Default is true for all other simple fields.

fields

SearchField[]

A list of sub-fields if this is a field of type Edm.ComplexType or Collection(Edm.ComplexType). Must be null or empty for simple fields.

filterable

boolean

A value indicating whether to enable the field to be referenced in $filter queries. filterable differs from searchable in how strings are handled. Fields of type Edm.String or Collection(Edm.String) that are filterable do not undergo word-breaking, so comparisons are for exact matches only. For example, if you set such a field f to "sunny day", $filter=f eq 'sunny' will find no matches, but $filter=f eq 'sunny day' will. This property must be null for complex fields. Default is true for simple fields and null for complex fields.

indexAnalyzer

LexicalAnalyzerName

The name of the analyzer used at indexing time for the field. This option can be used only with searchable fields. It must be set together with searchAnalyzer and it cannot be set together with the analyzer option. This property cannot be set to the name of a language analyzer; use the analyzer property instead if you need a language analyzer. Once the analyzer is chosen, it cannot be changed for the field. Must be null for complex fields.

key

boolean

A value indicating whether the field uniquely identifies documents in the index. Exactly one top-level field in each index must be chosen as the key field and it must be of type Edm.String. Key fields can be used to look up documents directly and update or delete specific documents. Default is false for simple fields and null for complex fields.

name

string

The name of the field, which must be unique within the fields collection of the index or parent field.

retrievable

boolean

A value indicating whether the field can be returned in a search result. You can disable this option if you want to use a field (for example, margin) as a filter, sorting, or scoring mechanism but do not want the field to be visible to the end user. This property must be true for key fields, and it must be null for complex fields. This property can be changed on existing fields. Enabling this property does not cause any increase in index storage requirements. Default is true for simple fields and null for complex fields.

searchAnalyzer

LexicalAnalyzerName

The name of the analyzer used at search time for the field. This option can be used only with searchable fields. It must be set together with indexAnalyzer and it cannot be set together with the analyzer option. This property cannot be set to the name of a language analyzer; use the analyzer property instead if you need a language analyzer. This analyzer can be updated on an existing field. Must be null for complex fields.

searchable

boolean

A value indicating whether the field is full-text searchable. This means it will undergo analysis such as word-breaking during indexing. If you set a searchable field to a value like "sunny day", internally it will be split into the individual tokens "sunny" and "day". This enables full-text searches for these terms. Fields of type Edm.String or Collection(Edm.String) are searchable by default. This property must be false for simple fields of other non-string data types, and it must be null for complex fields. Note: searchable fields consume extra space in your index to accommodate additional tokenized versions of the field value for full-text searches. If you want to save space in your index and you don't need a field to be included in searches, set searchable to false.

sortable

boolean

A value indicating whether to enable the field to be referenced in $orderby expressions. By default, the search engine sorts results by score, but in many experiences users will want to sort by fields in the documents. A simple field can be sortable only if it is single-valued (it has a single value in the scope of the parent document). Simple collection fields cannot be sortable, since they are multi-valued. Simple sub-fields of complex collections are also multi-valued, and therefore cannot be sortable. This is true whether it's an immediate parent field, or an ancestor field, that's the complex collection. Complex fields cannot be sortable and the sortable property must be null for such fields. The default for sortable is true for single-valued simple fields, false for multi-valued simple fields, and null for complex fields.

synonymMaps

string[]

A list of the names of synonym maps to associate with this field. This option can be used only with searchable fields. Currently only one synonym map per field is supported. Assigning a synonym map to a field ensures that query terms targeting that field are expanded at query-time using the rules in the synonym map. This attribute can be changed on existing fields. Must be null or an empty collection for complex fields.

type

SearchFieldDataType

The data type of the field.

vectorSearchProfile

string

The name of the vector search profile that specifies the algorithm to use when searching the vector field.

SearchFieldDataType

Defines the data type of a field in a search index.

Name Type Description
Edm.Boolean

string

Indicates that a field contains a Boolean value (true or false).

Edm.ComplexType

string

Indicates that a field contains one or more complex objects that in turn have sub-fields of other types.

Edm.DateTimeOffset

string

Indicates that a field contains a date/time value, including timezone information.

Edm.Double

string

Indicates that a field contains an IEEE double-precision floating point number.

Edm.GeographyPoint

string

Indicates that a field contains a geo-location in terms of longitude and latitude.

Edm.Int32

string

Indicates that a field contains a 32-bit signed integer.

Edm.Int64

string

Indicates that a field contains a 64-bit signed integer.

Edm.Single

string

Indicates that a field contains a single-precision floating point number. This is only valid when used with Collection(Edm.Single).

Edm.String

string

Indicates that a field contains a string.

SearchIndex

Represents a search index definition, which describes the fields and search behavior of an index.

Name Type Description
@odata.etag

string

The ETag of the index.

analyzers LexicalAnalyzer[]:

The analyzers for the index.

charFilters CharFilter[]:

The character filters for the index.

corsOptions

CorsOptions

Options to control Cross-Origin Resource Sharing (CORS) for the index.

defaultScoringProfile

string

The name of the scoring profile to use if none is specified in the query. If this property is not set and no scoring profile is specified in the query, then default scoring (tf-idf) will be used.

encryptionKey

SearchResourceEncryptionKey

A description of an encryption key that you create in Azure Key Vault. This key is used to provide an additional level of encryption-at-rest for your data when you want full assurance that no one, not even Microsoft, can decrypt your data. Once you have encrypted your data, it will always remain encrypted. The search service will ignore attempts to set this property to null. You can change this property as needed if you want to rotate your encryption key; Your data will be unaffected. Encryption with customer-managed keys is not available for free search services, and is only available for paid services created on or after January 1, 2019.

fields

SearchField[]

The fields of the index.

name

string

The name of the index.

scoringProfiles

ScoringProfile[]

The scoring profiles for the index.

semantic

SemanticSettings

Defines parameters for a search index that influence semantic capabilities.

similarity Similarity:

The type of similarity algorithm to be used when scoring and ranking the documents matching a search query. The similarity algorithm can only be defined at index creation time and cannot be modified on existing indexes. If null, the ClassicSimilarity algorithm is used.

suggesters

Suggester[]

The suggesters for the index.

tokenFilters TokenFilter[]:

The token filters for the index.

tokenizers LexicalTokenizer[]:

The tokenizers for the index.

vectorSearch

VectorSearch

Contains configuration options related to vector search.

SearchResourceEncryptionKey

A customer-managed encryption key in Azure Key Vault. Keys that you create and manage can be used to encrypt or decrypt data-at-rest on your search service, such as indexes and synonym maps.

Name Type Description
accessCredentials

AzureActiveDirectoryApplicationCredentials

Optional Azure Active Directory credentials used for accessing your Azure Key Vault. Not required if using managed identity instead.

keyVaultKeyName

string

The name of your Azure Key Vault key to be used to encrypt your data at rest.

keyVaultKeyVersion

string

The version of your Azure Key Vault key to be used to encrypt your data at rest.

keyVaultUri

string

The URI of your Azure Key Vault, also referred to as DNS name, that contains the key to be used to encrypt your data at rest. An example URI might be https://my-keyvault-name.vault.azure.net.

SemanticConfiguration

Defines a specific configuration to be used in the context of semantic capabilities.

Name Type Description
name

string

The name of the semantic configuration.

prioritizedFields

PrioritizedFields

Describes the title, content, and keyword fields to be used for semantic ranking, captions, highlights, and answers. At least one of the three sub properties (titleField, prioritizedKeywordsFields and prioritizedContentFields) need to be set.

SemanticField

A field that is used as part of the semantic configuration.

Name Type Description
fieldName

string

SemanticSettings

Defines parameters for a search index that influence semantic capabilities.

Name Type Description
configurations

SemanticConfiguration[]

The semantic configurations for the index.

defaultConfiguration

string

Allows you to set the name of a default semantic configuration in your index, making it optional to pass it on as a query parameter every time.

ShingleTokenFilter

Creates combinations of tokens as a single token. This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.ShingleTokenFilter

A URI fragment specifying the type of token filter.

filterToken

string

_

The string to insert for each position at which there is no token. Default is an underscore ("_").

maxShingleSize

integer

2

The maximum shingle size. Default and minimum value is 2.

minShingleSize

integer

2

The minimum shingle size. Default and minimum value is 2. Must be less than the value of maxShingleSize.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

outputUnigrams

boolean

True

A value indicating whether the output stream will contain the input tokens (unigrams) as well as shingles. Default is true.

outputUnigramsIfNoShingles

boolean

False

A value indicating whether to output unigrams for those times when no shingles are available. This property takes precedence when outputUnigrams is set to false. Default is false.

tokenSeparator

string

The string to use when joining adjacent tokens to form a shingle. Default is a single space (" ").

SnowballTokenFilter

A filter that stems words using a Snowball-generated stemmer. This token filter is implemented using Apache Lucene.

Name Type Description
@odata.type string:

#Microsoft.Azure.Search.SnowballTokenFilter

A URI fragment specifying the type of token filter.

language

SnowballTokenFilterLanguage

The language to use.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

SnowballTokenFilterLanguage

The language to use for a Snowball token filter.

Name Type Description
armenian

string

Selects the Lucene Snowball stemming tokenizer for Armenian.

basque

string

Selects the Lucene Snowball stemming tokenizer for Basque.

catalan

string

Selects the Lucene Snowball stemming tokenizer for Catalan.

danish

string

Selects the Lucene Snowball stemming tokenizer for Danish.

dutch

string

Selects the Lucene Snowball stemming tokenizer for Dutch.

english

string

Selects the Lucene Snowball stemming tokenizer for English.

finnish

string

Selects the Lucene Snowball stemming tokenizer for Finnish.

french

string

Selects the Lucene Snowball stemming tokenizer for French.

german

string

Selects the Lucene Snowball stemming tokenizer for German.

german2

string

Selects the Lucene Snowball stemming tokenizer that uses the German variant algorithm.

hungarian

string

Selects the Lucene Snowball stemming tokenizer for Hungarian.

italian

string

Selects the Lucene Snowball stemming tokenizer for Italian.

kp

string

Selects the Lucene Snowball stemming tokenizer for Dutch that uses the Kraaij-Pohlmann stemming algorithm.

lovins

string

Selects the Lucene Snowball stemming tokenizer for English that uses the Lovins stemming algorithm.

norwegian

string

Selects the Lucene Snowball stemming tokenizer for Norwegian.

porter

string

Selects the Lucene Snowball stemming tokenizer for English that uses the Porter stemming algorithm.

portuguese

string

Selects the Lucene Snowball stemming tokenizer for Portuguese.

romanian

string

Selects the Lucene Snowball stemming tokenizer for Romanian.

russian

string

Selects the Lucene Snowball stemming tokenizer for Russian.

spanish

string

Selects the Lucene Snowball stemming tokenizer for Spanish.

swedish

string

Selects the Lucene Snowball stemming tokenizer for Swedish.

turkish

string

Selects the Lucene Snowball stemming tokenizer for Turkish.

StemmerOverrideTokenFilter

Provides the ability to override other stemming filters with custom dictionary-based stemming. Any dictionary-stemmed terms will be marked as keywords so that they will not be stemmed with stemmers down the chain. Must be placed before any stemming filters. This token filter is implemented using Apache Lucene.

Name Type Description
@odata.type string:

#Microsoft.Azure.Search.StemmerOverrideTokenFilter

A URI fragment specifying the type of token filter.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

rules

string[]

A list of stemming rules in the following format: "word => stem", for example: "ran => run".

StemmerTokenFilter

Language specific stemming filter. This token filter is implemented using Apache Lucene.

Name Type Description
@odata.type string:

#Microsoft.Azure.Search.StemmerTokenFilter

A URI fragment specifying the type of token filter.

language

StemmerTokenFilterLanguage

The language to use.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

StemmerTokenFilterLanguage

The language to use for a stemmer token filter.

Name Type Description
arabic

string

Selects the Lucene stemming tokenizer for Arabic.

armenian

string

Selects the Lucene stemming tokenizer for Armenian.

basque

string

Selects the Lucene stemming tokenizer for Basque.

brazilian

string

Selects the Lucene stemming tokenizer for Portuguese (Brazil).

bulgarian

string

Selects the Lucene stemming tokenizer for Bulgarian.

catalan

string

Selects the Lucene stemming tokenizer for Catalan.

czech

string

Selects the Lucene stemming tokenizer for Czech.

danish

string

Selects the Lucene stemming tokenizer for Danish.

dutch

string

Selects the Lucene stemming tokenizer for Dutch.

dutchKp

string

Selects the Lucene stemming tokenizer for Dutch that uses the Kraaij-Pohlmann stemming algorithm.

english

string

Selects the Lucene stemming tokenizer for English.

finnish

string

Selects the Lucene stemming tokenizer for Finnish.

french

string

Selects the Lucene stemming tokenizer for French.

galician

string

Selects the Lucene stemming tokenizer for Galician.

german

string

Selects the Lucene stemming tokenizer for German.

german2

string

Selects the Lucene stemming tokenizer that uses the German variant algorithm.

greek

string

Selects the Lucene stemming tokenizer for Greek.

hindi

string

Selects the Lucene stemming tokenizer for Hindi.

hungarian

string

Selects the Lucene stemming tokenizer for Hungarian.

indonesian

string

Selects the Lucene stemming tokenizer for Indonesian.

irish

string

Selects the Lucene stemming tokenizer for Irish.

italian

string

Selects the Lucene stemming tokenizer for Italian.

latvian

string

Selects the Lucene stemming tokenizer for Latvian.

lightEnglish

string

Selects the Lucene stemming tokenizer for English that does light stemming.

lightFinnish

string

Selects the Lucene stemming tokenizer for Finnish that does light stemming.

lightFrench

string

Selects the Lucene stemming tokenizer for French that does light stemming.

lightGerman

string

Selects the Lucene stemming tokenizer for German that does light stemming.

lightHungarian

string

Selects the Lucene stemming tokenizer for Hungarian that does light stemming.

lightItalian

string

Selects the Lucene stemming tokenizer for Italian that does light stemming.

lightNorwegian

string

Selects the Lucene stemming tokenizer for Norwegian (Bokmål) that does light stemming.

lightNynorsk

string

Selects the Lucene stemming tokenizer for Norwegian (Nynorsk) that does light stemming.

lightPortuguese

string

Selects the Lucene stemming tokenizer for Portuguese that does light stemming.

lightRussian

string

Selects the Lucene stemming tokenizer for Russian that does light stemming.

lightSpanish

string

Selects the Lucene stemming tokenizer for Spanish that does light stemming.

lightSwedish

string

Selects the Lucene stemming tokenizer for Swedish that does light stemming.

lovins

string

Selects the Lucene stemming tokenizer for English that uses the Lovins stemming algorithm.

minimalEnglish

string

Selects the Lucene stemming tokenizer for English that does minimal stemming.

minimalFrench

string

Selects the Lucene stemming tokenizer for French that does minimal stemming.

minimalGalician

string

Selects the Lucene stemming tokenizer for Galician that does minimal stemming.

minimalGerman

string

Selects the Lucene stemming tokenizer for German that does minimal stemming.

minimalNorwegian

string

Selects the Lucene stemming tokenizer for Norwegian (Bokmål) that does minimal stemming.

minimalNynorsk

string

Selects the Lucene stemming tokenizer for Norwegian (Nynorsk) that does minimal stemming.

minimalPortuguese

string

Selects the Lucene stemming tokenizer for Portuguese that does minimal stemming.

norwegian

string

Selects the Lucene stemming tokenizer for Norwegian (Bokmål).

porter2

string

Selects the Lucene stemming tokenizer for English that uses the Porter2 stemming algorithm.

portuguese

string

Selects the Lucene stemming tokenizer for Portuguese.

portugueseRslp

string

Selects the Lucene stemming tokenizer for Portuguese that uses the RSLP stemming algorithm.

possessiveEnglish

string

Selects the Lucene stemming tokenizer for English that removes trailing possessives from words.

romanian

string

Selects the Lucene stemming tokenizer for Romanian.

russian

string

Selects the Lucene stemming tokenizer for Russian.

sorani

string

Selects the Lucene stemming tokenizer for Sorani.

spanish

string

Selects the Lucene stemming tokenizer for Spanish.

swedish

string

Selects the Lucene stemming tokenizer for Swedish.

turkish

string

Selects the Lucene stemming tokenizer for Turkish.

StopAnalyzer

Divides text at non-letters; Applies the lowercase and stopword token filters. This analyzer is implemented using Apache Lucene.

Name Type Description
@odata.type string:

#Microsoft.Azure.Search.StopAnalyzer

A URI fragment specifying the type of analyzer.

name

string

The name of the analyzer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

stopwords

string[]

A list of stopwords.

StopwordsList

Identifies a predefined list of language-specific stopwords.

Name Type Description
arabic

string

Selects the stopword list for Arabic.

armenian

string

Selects the stopword list for Armenian.

basque

string

Selects the stopword list for Basque.

brazilian

string

Selects the stopword list for Portuguese (Brazil).

bulgarian

string

Selects the stopword list for Bulgarian.

catalan

string

Selects the stopword list for Catalan.

czech

string

Selects the stopword list for Czech.

danish

string

Selects the stopword list for Danish.

dutch

string

Selects the stopword list for Dutch.

english

string

Selects the stopword list for English.

finnish

string

Selects the stopword list for Finnish.

french

string

Selects the stopword list for French.

galician

string

Selects the stopword list for Galician.

german

string

Selects the stopword list for German.

greek

string

Selects the stopword list for Greek.

hindi

string

Selects the stopword list for Hindi.

hungarian

string

Selects the stopword list for Hungarian.

indonesian

string

Selects the stopword list for Indonesian.

irish

string

Selects the stopword list for Irish.

italian

string

Selects the stopword list for Italian.

latvian

string

Selects the stopword list for Latvian.

norwegian

string

Selects the stopword list for Norwegian.

persian

string

Selects the stopword list for Persian.

portuguese

string

Selects the stopword list for Portuguese.

romanian

string

Selects the stopword list for Romanian.

russian

string

Selects the stopword list for Russian.

sorani

string

Selects the stopword list for Sorani.

spanish

string

Selects the stopword list for Spanish.

swedish

string

Selects the stopword list for Swedish.

thai

string

Selects the stopword list for Thai.

turkish

string

Selects the stopword list for Turkish.

StopwordsTokenFilter

Removes stop words from a token stream. This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.StopwordsTokenFilter

A URI fragment specifying the type of token filter.

ignoreCase

boolean

False

A value indicating whether to ignore case. If true, all words are converted to lower case first. Default is false.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

removeTrailing

boolean

True

A value indicating whether to ignore the last search term if it's a stop word. Default is true.

stopwords

string[]

The list of stopwords. This property and the stopwords list property cannot both be set.

stopwordsList

StopwordsList

english

A predefined list of stopwords to use. This property and the stopwords property cannot both be set. Default is English.

Suggester

Defines how the Suggest API should apply to a group of fields in the index.

Name Type Description
name

string

The name of the suggester.

searchMode

SuggesterSearchMode

A value indicating the capabilities of the suggester.

sourceFields

string[]

The list of field names to which the suggester applies. Each field must be searchable.

SuggesterSearchMode

A value indicating the capabilities of the suggester.

Name Type Description
analyzingInfixMatching

string

Matches consecutive whole terms and prefixes in a field. For example, for the field 'The fastest brown fox', the queries 'fast' and 'fastest brow' would both match.

SynonymTokenFilter

Matches single or multi-word synonyms in a token stream. This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.SynonymTokenFilter

A URI fragment specifying the type of token filter.

expand

boolean

True

A value indicating whether all words in the list of synonyms (if => notation is not used) will map to one another. If true, all words in the list of synonyms (if => notation is not used) will map to one another. The following list: incredible, unbelievable, fabulous, amazing is equivalent to: incredible, unbelievable, fabulous, amazing => incredible, unbelievable, fabulous, amazing. If false, the following list: incredible, unbelievable, fabulous, amazing will be equivalent to: incredible, unbelievable, fabulous, amazing => incredible. Default is true.

ignoreCase

boolean

False

A value indicating whether to case-fold input for matching. Default is false.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

synonyms

string[]

A list of synonyms in following one of two formats: 1. incredible, unbelievable, fabulous => amazing - all terms on the left side of => symbol will be replaced with all terms on its right side; 2. incredible, unbelievable, fabulous, amazing - comma separated list of equivalent words. Set the expand option to change how this list is interpreted.

TagScoringFunction

Defines a function that boosts scores of documents with string values matching a given list of tags.

Name Type Description
boost

number

A multiplier for the raw score. Must be a positive number not equal to 1.0.

fieldName

string

The name of the field used as input to the scoring function.

interpolation

ScoringFunctionInterpolation

A value indicating how boosting will be interpolated across document scores; defaults to "Linear".

tag

TagScoringParameters

Parameter values for the tag scoring function.

type string:

tag

Indicates the type of function to use. Valid values include magnitude, freshness, distance, and tag. The function type must be lower case.

TagScoringParameters

Provides parameter values to a tag scoring function.

Name Type Description
tagsParameter

string

The name of the parameter passed in search queries to specify the list of tags to compare against the target field.

TextWeights

Defines weights on index fields for which matches should boost scoring in search queries.

Name Type Description
weights

object

The dictionary of per-field weights to boost document scoring. The keys are field names and the values are the weights for each field.

TokenCharacterKind

Represents classes of characters on which a token filter can operate.

Name Type Description
digit

string

Keeps digits in tokens.

letter

string

Keeps letters in tokens.

punctuation

string

Keeps punctuation in tokens.

symbol

string

Keeps symbols in tokens.

whitespace

string

Keeps whitespace in tokens.

TokenFilterName

Defines the names of all token filters supported by the search engine.

Name Type Description
apostrophe

string

Strips all characters after an apostrophe (including the apostrophe itself). See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html

arabic_normalization

string

A token filter that applies the Arabic normalizer to normalize the orthography. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ar/ArabicNormalizationFilter.html

asciifolding

string

Converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if such equivalents exist. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html

cjk_bigram

string

Forms bigrams of CJK terms that are generated from the standard tokenizer. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/cjk/CJKBigramFilter.html

cjk_width

string

Normalizes CJK width differences. Folds fullwidth ASCII variants into the equivalent basic Latin, and half-width Katakana variants into the equivalent Kana. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/cjk/CJKWidthFilter.html

classic

string

Removes English possessives, and dots from acronyms. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/ClassicFilter.html

common_grams

string

Construct bigrams for frequently occurring terms while indexing. Single terms are still indexed too, with bigrams overlaid. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsFilter.html

edgeNGram_v2

string

Generates n-grams of the given size(s) starting from the front or the back of an input token. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenFilter.html

elision

string

Removes elisions. For example, "l'avion" (the plane) will be converted to "avion" (plane). See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/util/ElisionFilter.html

german_normalization

string

Normalizes German characters according to the heuristics of the German2 snowball algorithm. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html

hindi_normalization

string

Normalizes text in Hindi to remove some differences in spelling variations. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/hi/HindiNormalizationFilter.html

indic_normalization

string

Normalizes the Unicode representation of text in Indian languages. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/in/IndicNormalizationFilter.html

keyword_repeat

string

Emits each incoming token twice, once as keyword and once as non-keyword. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/KeywordRepeatFilter.html

kstem

string

A high-performance kstem filter for English. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/en/KStemFilter.html

length

string

Removes words that are too long or too short. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/LengthFilter.html

limit

string

Limits the number of tokens while indexing. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilter.html

lowercase

string

Normalizes token text to lower case. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/LowerCaseFilter.htm

nGram_v2

string

Generates n-grams of the given size(s). See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenFilter.html

persian_normalization

string

Applies normalization for Persian. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/fa/PersianNormalizationFilter.html

phonetic

string

Create tokens for phonetic matches. See https://lucene.apache.org/core/4_10_3/analyzers-phonetic/org/apache/lucene/analysis/phonetic/package-tree.html

porter_stem

string

Uses the Porter stemming algorithm to transform the token stream. See http://tartarus.org/~martin/PorterStemmer

reverse

string

Reverses the token string. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/reverse/ReverseStringFilter.html

scandinavian_folding

string

Folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o. It also discriminates against use of double vowels aa, ae, ao, oe and oo, leaving just the first one. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianFoldingFilter.html

scandinavian_normalization

string

Normalizes use of the interchangeable Scandinavian characters. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianNormalizationFilter.html

shingle

string

Creates combinations of tokens as a single token. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html

snowball

string

A filter that stems words using a Snowball-generated stemmer. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/snowball/SnowballFilter.html

sorani_normalization

string

Normalizes the Unicode representation of Sorani text. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ckb/SoraniNormalizationFilter.html

stemmer

string

Language specific stemming filter. See https://docs.microsoft.com/rest/api/searchservice/Custom-analyzers-in-Azure-Search#TokenFilters

stopwords

string

Removes stop words from a token stream. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/StopFilter.html

trim

string

Trims leading and trailing whitespace from tokens. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/TrimFilter.html

truncate

string

Truncates the terms to a specific length. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/TruncateTokenFilter.html

unique

string

Filters out tokens with same text as the previous token. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/RemoveDuplicatesTokenFilter.html

uppercase

string

Normalizes token text to upper case. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/UpperCaseFilter.html

word_delimiter

string

Splits words into subwords and performs optional transformations on subword groups.

TruncateTokenFilter

Truncates the terms to a specific length. This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.TruncateTokenFilter

A URI fragment specifying the type of token filter.

length

integer

300

The length at which terms will be truncated. Default and maximum is 300.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

UaxUrlEmailTokenizer

Tokenizes urls and emails as one token. This tokenizer is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.UaxUrlEmailTokenizer

A URI fragment specifying the type of tokenizer.

maxTokenLength

integer

255

The maximum token length. Default is 255. Tokens longer than the maximum length are split. The maximum token length that can be used is 300 characters.

name

string

The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

UniqueTokenFilter

Filters out tokens with same text as the previous token. This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.UniqueTokenFilter

A URI fragment specifying the type of token filter.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

onlyOnSamePosition

boolean

False

A value indicating whether to remove duplicates only at the same position. Default is false.

VectorSearch

Contains configuration options related to vector search.

Name Type Description
algorithms VectorSearchAlgorithmConfiguration[]:

Contains configuration options specific to the algorithm used during indexing or querying.

profiles

VectorSearchProfile[]

Defines combinations of configurations to use with vector search.

VectorSearchAlgorithmKind

The algorithm used for indexing and querying.

Name Type Description
exhaustiveKnn

string

Exhaustive KNN algorithm which will perform brute-force search.

hnsw

string

HNSW (Hierarchical Navigable Small World), a type of approximate nearest neighbors algorithm.

VectorSearchAlgorithmMetric

The similarity metric to use for vector comparisons.

Name Type Description
cosine

string

dotProduct

string

euclidean

string

VectorSearchProfile

Defines a combination of configurations to use with vector search.

Name Type Description
algorithm

string

The name of the vector search algorithm configuration that specifies the algorithm and optional parameters.

name

string

The name to associate with this particular vector search profile.

WordDelimiterTokenFilter

Splits words into subwords and performs optional transformations on subword groups. This token filter is implemented using Apache Lucene.

Name Type Default Value Description
@odata.type string:

#Microsoft.Azure.Search.WordDelimiterTokenFilter

A URI fragment specifying the type of token filter.

catenateAll

boolean

False

A value indicating whether all subword parts will be catenated. For example, if this is set to true, "Azure-Search-1" becomes "AzureSearch1". Default is false.

catenateNumbers

boolean

False

A value indicating whether maximum runs of number parts will be catenated. For example, if this is set to true, "1-2" becomes "12". Default is false.

catenateWords

boolean

False

A value indicating whether maximum runs of word parts will be catenated. For example, if this is set to true, "Azure-Search" becomes "AzureSearch". Default is false.

generateNumberParts

boolean

True

A value indicating whether to generate number subwords. Default is true.

generateWordParts

boolean

True

A value indicating whether to generate part words. If set, causes parts of words to be generated; for example "AzureSearch" becomes "Azure" "Search". Default is true.

name

string

The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

preserveOriginal

boolean

False

A value indicating whether original words will be preserved and added to the subword list. Default is false.

protectedWords

string[]

A list of tokens to protect from being delimited.

splitOnCaseChange

boolean

True

A value indicating whether to split words on caseChange. For example, if this is set to true, "AzureSearch" becomes "Azure" "Search". Default is true.

splitOnNumerics

boolean

True

A value indicating whether to split on numbers. For example, if this is set to true, "Azure1Search" becomes "Azure" "1" "Search". Default is true.

stemEnglishPossessive

boolean

True

A value indicating whether to remove trailing "'s" for each subword. Default is true.