Ask Learn Preview
Please sign in to use this experience.
Sign inThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
The indexing process works by creating a document for each indexed entity. During indexing, an enrichment pipeline iteratively builds the documents that combine metadata from the data source with enriched fields extracted by cognitive skills. You can think of each indexed document as a JSON structure, which initially consists of a document with the index fields you have mapped to fields extracted directly from the source data, like this:
When the documents in the data source contain images, you can configure the indexer to extract the image data and place each image in a normalized_images collection, like this:
Normalizing the image data in this way enables you to use the collection of images as an input for skills that extract information from image data.
Each skill adds fields to the document, so for example a skill that detects the language in which a document is written might store its output in a language field, like this:
The document is structured hierarchically, and the skills are applied to a specific context within the hierarchy, enabling you to run the skill for each item at a particular level of the document. For example, you could run an optical character recognition (OCR) skill for each image in the normalized images collection to extract any text they contain:
The output fields from each skill can be used as inputs for other skills later in the pipeline, which in turn store their outputs in the document structure. For example, we could use a merge skill to combine the original text content with the text extracted from each image to create a new merged_content field that contains all of the text in the document, including image text.
The fields in the final document structure at the end of the pipeline are mapped to index fields by the indexer in one of two ways:
Having an issue? We can help!
Please sign in to use this experience.
Sign in