Content understanding terminologies

Term Description
File Any type of data, including text, documents, images, videos, and audio.
File type The MIME type of a file, such as text/plain, application/pdf, image/jpeg, audio/wav, and video/mp4. Generic categories like document refer to all corresponding MIME types supported by the service.
Analyzer A component that processes and extracts content and structured fields from files. Content Understanding offers a few analyzer templates for common scenarios.
Analyzer template A predefined configuration and field schema for an analyzer. It simplifies creating analyzers by allowing modifications to a template instead of starting from scratch. This feature is available only in AI Foundry, not via REST API/SDKs.
Analyzer result The output generated by an analyzer after processing input data. It typically includes extracted content in Markdown, extracted fields, and optional modality-specific details.
Add-ons Added features that enhance content extraction results, such as layout elements, barcodes, and figures in documents.
Fields List of structured key-value pairs derived from the content, as defined by the field schema. Learn more about supported field value types.
Field schema A formal description of the fields to extract from the input. It specifies the name, description, value type, generation method, and more for each field.
Generation method The process of determining the extracted value of a specified field. Content Understanding supports:
Extract: Directly extract values from the input content, such as dates from receipts or item details from invoices.
Classify: Classify content into predefined categories, such as call sentiment or chart type.
Generate: Generate values from input data, such as summarizing an audio conversation or generating scene descriptions from videos.
Span A reference indicating the location of an element (for example, field, word) within the extracted Markdown content. A character offset and length represent a span. Different programming languages use various character encodings, which can affect the exact offset and length values for Unicode text. To avoid confusion, spans are only returned if the desired encoding is explicitly specified in the request. Some elements can map to multiple spans if they aren't contiguous in the markdown (for example, page).
Grounding source The specific regions in content where a value was generated. It has different representations depending on the file type:
Image - A polygon in the image, often an axis-aligned rectangle (bounding box).
PDF/TIFF - A polygon on a specific page, often a quadrilateral.
Audio - A start and end time range.
Video - A start and end time range with an optional polygon in each frame, often a bounding box.
Confidence score The level of certainty that the extracted data is accurate.