LlmInputHelper Class

Definition

Namespace:: Azure.AI.ContentUnderstanding

Assembly:: Azure.AI.ContentUnderstanding.dll

Package:: Azure.AI.ContentUnderstanding v1.2.0-beta.2

Source:: LlmInputHelper.cs

Important

Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.

Converts AnalysisResult objects into LLM-friendly text.

public static class LlmInputHelper

type LlmInputHelper = class

Public Module LlmInputHelper

Inheritance: Object
LlmInputHelper

Methods

Name Description

Name	Description
ToLlmInput(AnalysisResult, IDictionary<String,Object>, LlmInputOptions)	Converts a Content Understanding analysis result into LLM-friendly text. Produces a YAML front matter block followed by markdown body, suitable for injecting into an LLM prompt, storing in a vector database, or passing as tool output. The YAML front matter (delimited by `---`) may include: `contentType` (document, image, audio, video), `pages` (page range), `timeRange` (media time span), `category` (classification label), `fields` (extracted structured fields as YAML), `rai_warnings` (content safety flags), and any caller-supplied `metadata` entries. The markdown body contains the extracted text with page-break markers (`<!-- InputPageNumber: N -->`) inserted at page boundaries so downstream consumers can locate content by page number. `N` is the original 1-based page number from the source document (i.e., the page index in the analyzed PDF), not a counter that restarts at 1 for each call. This matters when the analyze request specifies a ContentRange (e.g., `"2-3,5"`): the markers in the output will read `InputPageNumber: 2`, `3`, `5` — not `1`, `2`, `3`. Downstream consumers (RAG indexers, page-citation prompts) can rely on the marker value to cite the correct source page even when only a subset of pages was analyzed. If the service markdown already contains `<!-- InputPageNumber:` markers, the helper passes the markdown through unchanged to avoid duplicate markers. Internal telemetry messages such as `LLMStats: ...` are filtered from the rendered `rai_warnings` front matter.

ToLlmInput(AnalysisResult, IDictionary<String,Object>, LlmInputOptions)

Converts a Content Understanding analysis result into LLM-friendly text.

Produces a YAML front matter block followed by markdown body, suitable for injecting into an LLM prompt, storing in a vector database, or passing as tool output.

The YAML front matter (delimited by ---) may include: contentType (document, image, audio, video), pages (page range), timeRange (media time span), category (classification label), fields (extracted structured fields as YAML), rai_warnings (content safety flags), and any caller-supplied metadata entries.

The markdown body contains the extracted text with page-break markers () inserted at page boundaries so downstream consumers can locate content by page number. N is the original 1-based page number from the source document (i.e., the page index in the analyzed PDF), not a counter that restarts at 1 for each call. This matters when the analyze request specifies a ContentRange (e.g., "2-3,5"): the markers in the output will read InputPageNumber: 2, 3, 5 — not 1, 2, 3. Downstream consumers (RAG indexers, page-citation prompts) can rely on the marker value to cite the correct source page even when only a subset of pages was analyzed. If the service markdown already contains <!-- InputPageNumber: markers, the helper passes the markdown through unchanged to avoid duplicate markers.

Internal telemetry messages such as LLMStats: ... are filtered from the rendered rai_warnings front matter.

Applies to

Feedback

Was this page helpful?