Episode

Azure AI Content Safety Prompt Shields

Prompt Shields is a unified API that analyzes large language model inputs and detects User Prompt attacks and Document attacks, which are two common types of adversarial inputs.

The Prompt Shields for User Prompts targets User Prompt injection attacks, where users deliberately exploit system vulnerabilities to elicit unauthorized behavior from the LLM. This could lead to inappropriate content generation or violations of system-imposed restrictions.

The Prompt Shields for Documents aims to safeguard against attacks that use information not directly supplied by the user or developer, such as external documents. Attackers might embed hidden instructions in these materials in order to gain unauthorized control over the LLM session.

In this demo, we’ll demonstrate the model’s ability to detect a jailbreak from either a prompt or document attack.

Disclosure: This demo contains an AI-generated voice.

Chapters

  • 00:00 - Introduction
  • 00:35 - Prompt attack
  • 01:06 - Document attack
  • 02:09 - Prompt and Document attack

Azure
Azure OpenAI Service