Chỉnh sửa

Chia sẻ qua


Agent Safety

Building secure AI agents is a shared responsibility between Agent Framework and application developers. Agent Framework provides the building blocks — abstractions, providers, and orchestration — but developers are responsible for validating inputs, securing data flows, and configuring tools appropriately for their scenario.

This article outlines best practices for building safe and secure agents with Agent Framework.

Understand trust boundaries

Data flows through several components when an agent runs: user input, chat history providers, context providers, the LLM service, and function tools. Each boundary where data enters or exits your application represents a potential attack surface.

Key trust boundaries to consider:

  • AI service — Receives chat messages (which may include PII and system instructions) and returns LLM-generated output.
  • Chat history storage — Providers may load and persist conversation messages via external storage.
  • Context services — Context providers may retrieve or store data from external services (memories, user profiles, RAG results).
  • Tool-accessed services — Function tools execute developer-supplied code that may call external APIs or databases.

All external service communication is handled by developer-chosen client SDKs. Agent Framework does not manage authentication, encryption, or connection details for these services.

Best practices

Validate function inputs

The AI can call any function you provide as a tool and choose the arguments. Treat LLM-provided arguments as untrusted input, similar to user input in a web API.

  • Use allow-listing — Validate inputs against known-good values rather than trying to filter known-bad patterns. For example, check that a file path is within an allowed directory rather than checking for .. traversal sequences.
  • Enforce type and range constraints — Verify that arguments are of the expected type and within acceptable ranges (numeric bounds, string length limits, date ranges).
  • Limit string lengths — Enforce maximum lengths on string arguments to prevent resource exhaustion or injection attacks.
  • Prevent path traversal — When functions accept file paths, resolve them to absolute paths and verify they fall within allowed directories.
  • Use parameterized queries — If arguments are used in SQL queries, shell commands, or other interpreted contexts, use parameterized queries or escaping — never string concatenation.

Require approval for high-risk tools

By default, all tools provided to an agent are invoked without user approval. Use the tool approval mechanism to gate high-risk operations behind human confirmation.

When deciding which tools require approval, consider:

  • Side effects — Tools that modify data, send communications, make purchases, or have other side effects should generally require approval.
  • Data sensitivity — Tools that access or return sensitive data (PII, financial data, credentials) warrant approval.
  • Reversibility — Irreversible operations (deletion, sending emails) are higher risk than read-only queries.
  • Scope of impact — Tools with broad impact (bulk operations) should require more scrutiny than narrowly-scoped ones.

Keep system messages developer-controlled

Chat messages carry a role (system, user, assistant, tool) that determines how the AI service interprets them. Understanding these roles is critical:

Role Trust level
system Highest trust — Directly shapes LLM behavior. Must never contain untrusted input.
user Untrusted — May contain prompt injection attempts or malicious content.
assistant Untrusted — Generated by the LLM, which is an external system.
tool Untrusted — May contain data from external systems or user-influenced content.

Do not place end-user input into system-role messages. Agent Framework defaults untyped text to user role, but be careful when constructing messages programmatically.

Vet extension providers

Context providers and history providers can inject messages with any role, including system. Only attach providers you trust.

Be aware of indirect prompt injection: if the underlying data store is compromised, adversarial content could influence LLM behavior. For example, a document retrieved via RAG could contain hidden instructions that cause the LLM to deviate from intended behavior or exfiltrate data through tool calls.

Validate and sanitize LLM output

LLM responses should be treated as untrusted output. The AI service is an external endpoint that Agent Framework does not control. Be aware of:

  • Hallucination — LLMs may generate plausible-sounding but factually incorrect information. Do not treat LLM output as authoritative without verification.
  • Indirect prompt injection — Data retrieved by tools, context providers, or chat history providers may contain adversarial content designed to influence the LLM.
  • Malicious payloads — LLM output may contain content that is harmful if rendered or executed without sanitization (HTML/JavaScript for XSS, SQL for injection, shell commands).

Always validate and sanitize LLM output before rendering it in HTML, executing it as code, using it in database queries, or passing it to any security-sensitive context.

Protect sensitive data in logs

Agent Framework supports logging and telemetry via OpenTelemetry. Sensitive data is only logged when explicitly enabled:

  • Logging — At log level Trace, the full ChatMessages collection is logged. This can include PII. Trace level should never be enabled in production.
  • Telemetry — When EnableSensitiveData is set, telemetry includes the full text of chat messages including function calls and results. Do not enable this in production.

Secure session data

Sessions (AgentSession) represent conversation context and can be serialized for persistence. Treat serialized sessions as sensitive data:

  • Sessions may reference conversation content or session identifiers.
  • Restoring a session from an untrusted source is equivalent to accepting untrusted input. A compromised storage backend could alter roles to escalate trust.
  • Store sessions in secure storage with appropriate access controls and enryption.

Implement resource limits

Agent Framework does not impose constraints on input/output length or request rates, because it doesn't know what is reasonable for your scenario. You are responsible for:

  • Input length limits — Constrain input length to prevent context overflow or DoS attacks.
  • Output length limits — Use service-provided limits (for example, MaxOutputTokens in chat options).
  • Rate limiting — Use rate limiting facilities to prevent cost overruns and abuse from concurrent requests.

Next steps