Module assessment

1.

A clinical agent processes patient-provided PDF documents. During a security review, a tester successfully embeds instructions in a PDF that cause the agent to reveal its system prompt. Which defense layer was bypassed, and what is the primary remediation?

The output scanning layer was bypassed; implement response filtering to detect and block responses that contain system prompt content.

The structural separation layer was bypassed; redesign the prompt to explicitly delimit document content using XML-style tags and instruct the agent that content inside document tags is data to analyze, not instructions to follow.

The input sanitization layer was bypassed; add regex patterns to strip any text that looks like instructions from PDF content before it reaches the agent.

2.

Your prompt engineering team changes the escalation trigger instructions in the clinical agent's system prompt. The evaluation score for clinical accuracy improves by three percent, but safety refusal rate drops by eight percent. How should this change be handled?

Deploy the change—the three percent quality improvement outweighs the eight percent safety decline since quality is the primary metric.

Block deployment; investigate which escalation trigger change caused the refusal drop, restore the safe escalation behavior, and rerun the evaluation before considering deployment.

Deploy to a five percent canary first to observe live safety metrics before full rollout.

3.

A new clinical guideline is published mid-session while a patient consultation is in progress. At which point in the multiturn reasoning chain should the updated guideline be injected?

Inject it immediately as a user message, regardless of which reasoning step is currently in progress.

Hold the updated guideline and inject it before the next reasoning turn begins, so the new information enters at a clean reasoning boundary.

Discard the updated guideline until the session ends, then apply it to the next consultation to avoid disrupting in-progress reasoning.

Feedback