Hello !
Thank you for posting on Microsoft Learn Q&A.
You need to do the extraction at indexing time with a skillset and store just the value you need in your index then your query returns that small field instead of the whole chunk of text.
The 1st solution is to use regex with a custom web API skill when the text pattern is predictable. You add fields to your index
{
"name": "docs",
"fields": [
{ "name": "id", "type": "Edm.String", "key": true, "filterable": true },
{ "name": "content", "type": "Edm.String", "searchable": true },
{ "name": "applesQuantity", "type": "Edm.Int32", "filterable": true, "sortable": true, "facetable": true, "retrievable": true },
{ "name": "applesText", "type": "Edm.String", "retrievable": true }
]
}
Then create a skillset that calls your Azure Function and runs a regex :
{
"name": "docs-skillset",
"skills": [
{
"@odata.type": "#Microsoft.Skills.Util.DocumentExtractionSkill",
"inputs": [{ "name": "document", "source": "/document" }],
"outputs": [{ "name": "content", "targetName": "content" }]
},
{
"@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"name": "#extract-apples",
"description": "Extract 'Quantity of apples' as a number and as 'N apples'",
"context": "/document",
"uri": "https://<your-function>.azurewebsites.net/api/extractApples",
"httpMethod": "POST",
"inputs": [
{ "name": "text", "source": "/document/content" },
{ "name": "pattern", "value": "Quantity\\s*of\\s*apples\\s*[:=]\\s*(\\d+)" }
],
"outputs": [
{ "name": "applesQuantity", "targetName": "applesQuantity" },
{ "name": "applesText", "targetName": "applesText" }
]
}
]
}
Your function only needs to parse the input and return something like:
{ "values": [ { "recordId": "1", "data": { "applesQuantity": 200, "applesText": "200 apples" } } ] }
Then map the skill outputs to index fields in the indexer :
{
"name": "docs-indexer",
"dataSourceName": "docs-ds",
"targetIndexName": "docs",
"skillsetName": "docs-skillset",
"outputFieldMappings": [
{ "sourceFieldName": "/document/applesQuantity", "targetFieldName": "applesQuantity" },
{ "sourceFieldName": "/document/applesText", "targetFieldName": "applesText" }
]
}
and query only the extracted value :
POST https://<service>.search.windows.net/indexes/docs/docs/search?api-version=2025-09-01
api-key: <key>
{
"search": "*",
"select": "id,applesText,applesQuantity",
"filter": "applesQuantity gt 0"
}
The 2nd solution is to use entity recognition skill if you can’t rely on a fixed string, but you still want to pull numbers from prose.
Add the Entity Recognition (v3) skill with categories: ["Quantity"]:
{
"@odata.type": "#Microsoft.Skills.Text.V3.EntityRecognitionSkill",
"context": "/document",
"categories": [ "Quantity" ],
"inputs": [{ "name": "text", "source": "/document/content" }],
"outputs": [{ "name": "entities", "targetName": "entities" }]
}
You’ll get entities with category "Quantity" and you can post process via a Conditional and Shaper skill or your app to keep the one near the term apples and map it into applesQuantity.
Azure AI Search supports Lucene regex queries (set queryType=full), but regex at query time only helps with matching and it doesn’t return capture groups on its own. You’d still have to fetch the text and extract the number in your app. Index-time extraction is cleaner and cheaper to serve.