Configure audit workflows for regulated decisions

Completed

Azure Cosmos DB and Application Insights provide the persistent audit infrastructure that multi-agent systems need to satisfy regulatory requirements and prove that human oversight is meaningful. Every auditable decision generates an immutable record—capturing context, reviewer identity, rationale, and execution outcome—that supports compliance reviews, incident investigations, and exception reporting.

Decision taxonomy for audit requirements

Not all agent decisions need the same level of audit scrutiny. Adventure Works categorizes decisions into four tiers with different logging, review, and retention requirements.

Routine decisions include standard product information queries, order status lookups, shipping quote calculations, and general policy explanations. These decisions don't require audit trails beyond standard application logging because they don't affect customer accounts, don't access sensitive data, and don't create financial obligations. Routine decisions are logged to Application Insights for debugging but not to the compliance audit database.

Notable decisions include return approvals within policy limits, loyalty points adjustments within tier boundaries, discount code applications, and password resets. These decisions affect customer accounts but within predefined boundaries that don't require human approval. Notable decisions are logged to the audit database with decision type, inputs summary (no PII), decision outcome, and timestamp. The logs support monthly reporting but don't trigger review workflows.

Auditable decisions include exception approvals, refunds above thresholds, bulk operations affecting multiple customers, account status changes, and policy deviation requests. These decisions require human review before execution and comprehensive audit records that capture who approved, why they approved, what context they reviewed, and what actions resulted. Auditable decisions link to OpenTelemetry traces for full context reconstruction.

Regulated decisions include data access requests (individual data access rights), fraud investigations, regulatory holds, account closures for compliance reasons, and any decision that could expose the company to legal liability. Regulated decisions require enhanced documentation with legal review, extended retention (7+ years), immutable audit trails, and quarterly compliance review.

Decision category Human review Audit logging Retention period Compliance review
Routine None Standard logs 30 days None
Notable None (within bounds) Audit database 1 year Annual reporting
Auditable Required Full audit trail 3 years Quarterly sampling
Regulated Required + legal Immutable record 7+ years Every instance

The taxonomy drives both system design and operational processes—routine decisions optimize for performance, auditable decisions optimize for traceability, and regulated decisions optimize for compliance defensibility.

Audit trail data model and implementation

Each auditable decision creates a comprehensive audit record in Azure Cosmos DB that captures everything needed to reconstruct the decision, understand the rationale, and verify human oversight. Adventure Works' audit record schema includes:

Decision identification: Unique decision ID, decision type (return exception, refund approval, bulk credit), agent ID that made the recommendation, session ID for context, timestamp (UTC), correlation ID linking to OpenTelemetry trace

Input context: Request summary (structured, no raw PII), customer tier and account status, relevant policy version numbers, data sources accessed during decision (product catalog v23, return policy v5), any tool calls made and their results

Agent recommendation: What action the agent proposed, confidence score (raw and calibrated), reasoning provided by agent, business impact assessment, escalation triggers that caused human review

Human review: Approver identity (hashed for privacy but traceable), approval or rejection decision, review timestamp, comment or rationale provided, time from escalation to decision (SLA tracking)

Execution outcome: What actions were actually performed, execution timestamp, success or failure status, any errors encountered, customer notification sent, financial impact (if applicable)

The data model uses JSON documents with structured fields for queryability and free-text fields for human rationale. Example audit record:

{
  "decision_id": "dec_2026-04-07_89234",
  "decision_type": "return_exception_approval",
  "agent_id": "return_policy_agent_v3",
  "session_id": "sess_892374",
  "timestamp": "2026-04-07T14:32:18Z",
  "trace_id": "trace-abc123def456",
  
  "input_context": {
    "request_summary": "Customer requests return for order #45123, 42 days after delivery",
    "customer_tier": "gold",
    "account_tenure_months": 84,
    "order_value": 249.99,
    "policy_version": "return_policy_v5",
    "data_sources": ["order_database", "customer_profile", "policy_database"]
  },
  
  "agent_recommendation": {
    "proposed_action": "approve_exception",
    "confidence_raw": 0.72,
    "confidence_calibrated": 0.68,
    "reasoning": "Customer is gold tier with 7-year history, product arrived damaged per initial support ticket, reasonable exception",
    "business_impact": "moderate",
    "escalation_trigger": "outside_policy_window"
  },
  
  "human_review": {
    "approver_id_hash": "approver_hash_xyz789",
    "decision": "approved",
    "review_timestamp": "2026-04-07T16:15:42Z",
    "comment": "Justified exception given damage report and customer history",
    "review_duration_minutes": 103
  },
  
  "execution_outcome": {
    "actions_performed": ["issued_return_label", "scheduled_refund"],
    "execution_timestamp": "2026-04-07T16:16:05Z",
    "status": "success",
    "customer_notified": true,
    "financial_impact": -249.99
  }
}

This record structure enables compliance queries: show all decisions reviewed by approver X, show all return exceptions in Q1 2026, show all approvals where agent confidence was below 0.70, and calculate average review time by decision type.

Exception reporting and analysis workflows

Adventure Works runs monthly exception reports that analyze auditable decision patterns to identify optimization opportunities and compliance risks. The exception report queries the audit database and generates:

Approval metrics by agent: How many decisions each agent escalated, approval rate (% approved vs. rejected), average confidence score of escalated decisions, average review duration, financial impact of approved exceptions

Approval metrics by approver: How many reviews each approver completed, their approval rate (are they rubber-stamping or applying genuine scrutiny?), average review time (are they reviewing quickly or thoroughly?), reversal rate (how often are their approvals later determined to be incorrect?)

Exception pattern analysis: Which request categories generate most exceptions, which agents have highest exception rates, which policies trigger most escalations, time-of-day patterns for escalations (are night-shift requests more problematic?)

SLA compliance: What percentage of reviews completed within 4-hour SLA, what percentage required escalation due to timeout, longest pending review time, approval workflow health

The monthly report includes a KQL query that runs against Application Insights linked to the Cosmos DB audit trail:

// Exception approval summary for previous month
let startDate = startofmonth(ago(30d));
let endDate = endofmonth(ago(30d));

auditTrail
| where timestamp between (startDate .. endDate)
| where decision_type in ("return_exception_approval", "refund_approval", "bulk_credit_approval")
| extend approval_decision = tostring(human_review.decision)
| extend agent_id = tostring(agent_id)
| extend approver = tostring(human_review.approver_id_hash)
| extend review_duration = toint(human_review.review_duration_minutes)
| extend confidence = todouble(agent_recommendation.confidence_calibrated)
| summarize 
    total_decisions = count(),
    approved = countif(approval_decision == "approved"),
    rejected = countif(approval_decision == "rejected"),
    avg_confidence = avg(confidence),
    avg_review_time_minutes = avg(review_duration),
    total_financial_impact = sum(todouble(execution_outcome.financial_impact))
    by agent_id
| extend approval_rate = round(todouble(approved) / todouble(total_decisions), 2)
| project agent_id, total_decisions, approval_rate, avg_confidence, avg_review_time_minutes, total_financial_impact
| order by total_decisions desc

High rejection rates for a specific agent indicate prompt improvements are needed. Very high approval rates (> 95%) across all agents suggest either agents are too conservative in their recommendations or reviewers aren't applying sufficient scrutiny.

Compliance posture for automated decision systems

Data protection regulations broadly require that decisions based solely on automated processing, which significantly affect individuals, include meaningful human oversight. For Adventure Works' multi-agent system, decisions like return approvals, refund processing, and exception grants are exactly the kind of consequential decisions that data protection frameworks target. The human-in-the-loop workflows satisfy this oversight requirement, but compliance requires more than just having a human click approve—the oversight must be meaningful.

Meaningful human oversight requires more than a reviewer clicking approve—they must genuinely assess each decision, have real authority to override the agent, and receive enough context for an informed judgment. Rejection rates of 12–15% across Adventure Works' reviewers confirm that oversight is genuine, not nominal, backed by comprehensive approval cards, compliance team spot-checks, and reviewer training on their data protection responsibilities.

Right to explanation requires that individuals can obtain meaningful information about the logic involved in automated decisions. When a customer asks "Why was my return request approved/denied?", Adventure Works can retrieve the audit record and provide: what policy the decision was based on, what factors the agent considered, who reviewed the decision, and what rationale the reviewer provided. The transparency demonstrates compliance and builds customer trust.

Proof of oversight for auditors goes beyond documentation—it requires demonstrating that reviewers can and do override agent decisions. During SOC 2 audits or regulatory compliance reviews, the evidence package includes sample audit records, monthly exception reports, reviewer training records, examples of overridden decisions, and incident response logs confirming reversals. Together these materials show that human oversight is substantive, not theoretical.

The documented architecture—including the escalation thresholds, durable approval workflows, audit trails, and active feedback loops—becomes the compliance evidence package that external auditors review to certify the system meets regulatory requirements.

Decision reversal and remediation processes

Even with human oversight, mistakes happen. An approver reviews a return exception at 11 PM after a long day and approves without noticing the customer already received a refund through another channel, resulting in double payment. An agent misinterprets inventory data and approves a bulk order that can't be fulfilled. The audit system must support decision reversal and customer remediation when errors are discovered.

Adventure Works implements a decision reversal workflow: any auditable decision can be reversed within 30 days by flagging the decision ID, providing reversal justification, and executing compensating actions (reverse refund, cancel credit, notify affected customers). The reversal creates a new audit record linked to the original decision, maintaining full traceability.

Reversal rate tracking is a key compliance metric. If reversal rate exceeds 2% for any decision category, a root cause analysis begins—either agent quality has degraded, reviewer oversight isn't effective, or the decision process has systemic flaws. High reversal rates signal compliance risk that requires executive attention.

With comprehensive audit trails, exception reporting, regulatory compliance documentation, and reversal capabilities, Adventure Works has built a human-in-the-loop system that not only functions operationally but satisfies external auditors and regulatory requirements.

Key takeaways

  • Decision taxonomy categorizes agent decisions into four tiers (routine, notable, auditable, regulated) with escalating logging, review, and retention requirements.
  • Comprehensive audit records capture decision identification, input context, agent recommendation, human review details, and execution outcomes in structured Cosmos DB documents.
  • Exception reporting runs monthly KQL queries analyzing approval patterns by agent, approver, request category, and SLA compliance to identify optimization opportunities and compliance risks.
  • Regulatory compliance requires demonstrating meaningful human oversight through genuine review scrutiny, override capability, sufficient reviewer context, and adequate review time.
  • Decision reversal enables corrections within 30 days through compensating actions and linked audit records, with reversal rate tracking serving as a key compliance health metric.