Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Collections are the fundamental unit of data organization in Agentic Retrieval's knowledge layer. Each collection is a logical grouping that contains ingested documents stored as vector embeddings and metadata. This article explains how collections work, how they're organized internally, and when to use multiple collections in your solution.
Important
Agentic Retrieval in Foundry Local is currently in PREVIEW. See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.
What are collections?
A collection is a named container for vector data. When you ingest documents, they're parsed, chunked, embedded, and stored in a specific collection. When you query, you specify which collections to search.
Collections provide:
- Data isolation: Separate datasets for different use cases, departments, or tenants.
- Access control: Role-based access control (RBAC) policies can be applied per collection.
- Independent lifecycle: Create, ingest, query, and delete collections independently.
Each collection maps to up to four Milvus vector collections and associated Postgres tables. When image embedding is available (GPU mode), all four are created; in CPU-only mode, only the two text collections are provisioned:
Hyphens in collection names are replaced with underscores to form the internal storage_prefix. For example, collection my-docs uses storage prefix my_docs.
Collection lifecycle
The collection lifecycle follows these steps:
Create → Ingest → Query → Update → Delete
| Stage | API | Description |
|---|---|---|
| Create | POST /edgeai/collections |
Creates the collection and provisions four Milvus collections and Postgres tables. |
| Ingest | POST /edgeai/ingestion/jobs/{job_id} |
Ingests documents into the collection. Specify collectionName in the request body. |
| Query | POST /edgeai/chat/completions |
Queries the collection using RAG. Specify data_sources[0].parameters.index_name. |
| Update | PATCH /edgeai/collections/{name} |
Updates the collection description (name is immutable). |
| Delete | DELETE /edgeai/collections/{name} |
Deletes the collection and all associated Milvus collections, Postgres tables, and metadata. |
Default collection
Agentic Retrieval autocreates a default collection named edgeragapp on startup. This collection:
- Is used when no
collectionNameis specified during ingestion or querying. - Can't be deleted; returns
409 Conflict. - Requires an
edgeragappapp role for end-user access through Azure role-based access control (Azure RBAC).
Collection naming rules
- Lowercase letters, digits, and hyphens only
- Must start and end with an alphanumeric character
- 2–49 characters
- Names
default,system,edgeragenduser, andedgeragdeveloperare reserved - The default collection
edgeragappis autocreated and can't be deleted
Collections and RBAC
When users access collections through the external endpoint (ingress), JSON Web Token (JWT) roles control access:
| Role | Access level |
|---|---|
EdgeRAGDeveloper |
Full access to all collections. Required for management APIs (collections, ingestion). |
EdgeRAGEndUser |
Access only to collections where the user has a matching app role (for example, role finance-docs grants access to collection finance-docs). |
If a user has the EdgeRAGEndUser role but no collection-specific role assignments, they receive 403 Forbidden when querying any collection. Ensure users are assigned app roles matching the collection names they need to access.
For step-by-step instructions on creating collection-specific app roles, see Create app roles for collection access.
Azure RBAC is bypassed when using port-forwarding or internal Dapr calls (development or testing only).
Collections and knowledge sources
Collections connect to the agentic layer through knowledge sources:
- The built-in MCP server exposes search tools that query collections.
- A knowledge source with
indexed_source_refset to a collection name maps an MCP search tool to a specific collection. - Knowledge sources are grouped into knowledge bases, which are assigned to agents.
Agent → Knowledge base → Knowledge source (indexed_source_ref = "my-docs") → MCP server → Collection "my-docs"
The indexed_source_ref field on a knowledge source refers to a collection name when pointing to the built-in MCP server. This field is the bridge between the agentic layer and the knowledge layer.
When to use multiple collections
The following scenarios describe which collections to use:
| Scenario | Recommendation |
|---|---|
| Single dataset, single team | Use the default edgeragapp collection. |
| Department-scoped data | Create one collection per department (for example, engineering-docs, hr-policies). Apply RBAC per collection. |
| Multi-tenant | Create one collection per tenant. Use RBAC to enforce tenant isolation. |
| Versioned datasets | Create time-stamped collections (for example, catalog-2026-q1). Delete old collections when no longer needed. |
| Mixed confidentiality | Separate public and confidential data into different collections with different RBAC policies. |