Agent Governance & Human-in-the-Loop

Every production agent needs three governance layers wrapped around it. None of them are optional once the agent can modify data or send messages on someone's behalf.

This page is the governance layer. For the agent instructions themselves, see Role, Goal, Backstory.

1. Human-in-the-Loop (HITL)

Rule of thumb: any agent that can modify data, spend money, or send a message on someone's behalf must have a HITL layer.

When HITL is mandatory

Action class	HITL required?	Notes
Read-only retrieval, summarization	No	Treat like a chatbot.
Internal-only drafting (no send)	No	The human chooses to send.
Sending external messages	Yes	Email, SMS, social post, customer notification.
Mutating data in a system of record	Yes	CRM updates, ticket changes, ledger entries.
Spending money	Yes	Even small amounts. Cap and approve.
Granting or changing access	Yes	Permissions, role assignments, secrets.
High-value or irreversible actions	Yes	Anything you cannot easily undo.
Deviation from a predefined path	Yes	If the agent took an unplanned route, surface it.

Approval mechanics

Every HITL approval needs to be traceable — who approved, when, and on what evidence.

Slack / email approval. The agent posts the proposed action with full context (inputs, planned tool call, expected effect). An authorized human clicks Approve / Reject. The decision and approver are logged.
Traceable button push. In an internal dashboard, the action is staged with a Diff view; an approve button confirms it. Same logging.
CLI command. For engineering-owned agents, the agent prints the proposed change and waits for an explicit confirm command. Logged in shell history and in the agent's own audit log.

What the approval payload should always contain:

The user request that triggered the action.
The tool, the exact arguments, and the system that will be touched.
A plain-English explanation of the effect.
The agent's confidence (HIGH / MEDIUM / LOW).
A clear cancel path.

Auto-approval, with limits

For high-volume agents, pure HITL does not scale. Use policy-based auto-approval:

Below threshold X (dollar amount, blast radius, low-risk action class) → auto-approve, log everything, sample-audit weekly.
Above threshold X → require human.
Hard ceiling Y → no automation, ever — only humans.

Document the thresholds. Review them quarterly.

2. Operating rules

Every agent — and every tool — must publish its operating rules. The pattern is "always do / always avoid":

Always do	Always avoid
Use only approved tools listed in the registry.	Invent data, IDs, query results, or tool outputs.
Cite the source for any factual claim.	Mix multiple jobs into one role.
Ask clarifying questions when required input is missing.	Take an irreversible action without HITL approval.
Return a clear "INSUFFICIENT DATA" response when uncertain.	Use credentials beyond least-privilege scope.
Log every tool call with arguments and result.	Bypass logging or audit hooks.

These rules belong in the agent's Operating Rules block (see the Role/Goal/Backstory template) and mirrored as enforced controls in code where possible. Belt and suspenders.

3. Evaluation process

Governance without evaluation is theater. Wire evaluation in before launch and gate every change on it.

The minimum:

Golden set. 50–100 representative inputs with known-good outputs. (See Hallucination Prevention Protocol for the original concept and Agent Evaluation Framework for the full methodology.)
Regression run. Re-run the full set on every prompt change, model change, or tool change.
Scoring rubric. Pass / fail per item with a documented reason for failure.
Ship gate. Below the documented threshold (commonly 95% on a high-stakes set), the change does not ship.
Drift watch. Run the eval on a schedule against production traffic samples to catch silent regressions.

4. Observability

You cannot govern what you cannot see. Every agent emits, at minimum:

Signal	What it captures	Why
Conversation log	The full prompt, the full response, the tool calls.	Root-cause investigation.
Decision log	Which tool was chosen and why, when known.	Detects bad routing patterns.
HITL log	Each approval / rejection with approver, timestamp, payload.	Audit and compliance.
Cost log	Tokens in / out, dollars per request.	Budget enforcement.
Latency log	Per request and per tool call.	UX and SLOs.
Eval log	Rolling score on the golden set.	Regression detection.

Retain logs long enough to satisfy your own compliance posture (often 90 days minimum, longer for regulated industries). Strip PII at write time wherever possible — see Agent Security.

5. Governance launch checklist

Before promoting any agent to production:

HITL is wired in for every irreversible or external action.
Auto-approval thresholds are documented and approved.
Operating rules are written into the agent prompt and enforced in code where possible.
A golden eval set exists, the score is above threshold, and the suite runs in CI.
Conversation, decision, HITL, cost, latency, and eval logs are emitting.
Alert thresholds are set (cost spike, rejection-rate spike, eval drop, latency p99).
An owner and an on-call exist for the agent.
An organization-wide AI Policy covers this agent's data class.

Need help implementing or feeling stuck? Contact us today to establish a consulting relationship.

1. Human-in-the-Loop (HITL)​

When HITL is mandatory​

Approval mechanics​

Auto-approval, with limits​

2. Operating rules​

3. Evaluation process​

4. Observability​

5. Governance launch checklist​