Episode

Azure AI Content Safety Prompt Shields

Prompt Shields is a unified API that analyzes large language model inputs and detects User Prompt attacks and Document attacks, which are two common types of adversarial inputs.

The Prompt Shields for User Prompts targets User Prompt injection attacks, where users deliberately exploit system vulnerabilities to elicit unauthorized behavior from the LLM. This could lead to inappropriate content generation or violations of system-imposed restrictions.

The Prompt Shields for Documents aims to safeguard against attacks that use information not directly supplied by the user or developer, such as external documents. Attackers might embed hidden instructions in these materials in order to gain unauthorized control over the LLM session.

In this demo, we’ll demonstrate the model’s ability to detect a jailbreak from either a prompt or document attack.

Disclosure: This demo contains an AI-generated voice.

Chapters

00:00 - Introduction
00:35 - Prompt attack
01:06 - Document attack
02:09 - Prompt and Document attack

Recommended resources

Azure AI Content Safety

Prompt Shields is a unified API that analyzes large language model inputs and detects User Prompt attacks and Document attacks, which are two common types of adversarial inputs.

In this demo, we’ll demonstrate the model’s ability to detect a jailbreak from either a prompt or document attack.

Disclosure: This demo contains an AI-generated voice.

Chapters

00:00 - Introduction
00:35 - Prompt attack
01:06 - Document attack
02:09 - Prompt and Document attack

Recommended resources

Azure AI Content Safety

Azure

Azure OpenAI Service

Azure AI Content Safety Prompt Shields

Chapters

Recommended resources

Related episodes

Chapters

Recommended resources

Related episodes