Agent Governance & Human-in-the-Loop
Every production agent needs three governance layers wrapped around it. None of them are optional once the agent can modify data or send messages on someone's behalf.
This page is the governance layer. For the agent instructions themselves, see Role, Goal, Backstory.
1. Human-in-the-Loop (HITL)
Rule of thumb: any agent that can modify data, spend money, or send a message on someone's behalf must have a HITL layer.
When HITL is mandatory
| Action class | HITL required? | Notes |
|---|---|---|
| Read-only retrieval, summarization | No | Treat like a chatbot. |
| Internal-only drafting (no send) | No | The human chooses to send. |
| Sending external messages | Yes | Email, SMS, social post, customer notification. |
| Mutating data in a system of record | Yes | CRM updates, ticket changes, ledger entries. |
| Spending money | Yes | Even small amounts. Cap and approve. |
| Granting or changing access | Yes | Permissions, role assignments, secrets. |
| High-value or irreversible actions | Yes | Anything you cannot easily undo. |
| Deviation from a predefined path | Yes | If the agent took an unplanned route, surface it. |
Approval mechanics
Every HITL approval needs to be traceable — who approved, when, and on what evidence.
- Slack / email approval. The agent posts the proposed action with full context (inputs, planned tool call, expected effect). An authorized human clicks Approve / Reject. The decision and approver are logged.
- Traceable button push. In an internal dashboard, the action is staged with a Diff view; an approve button confirms it. Same logging.
- CLI command. For engineering-owned agents, the agent prints the proposed change and waits for an explicit
confirmcommand. Logged in shell history and in the agent's own audit log.
What the approval payload should always contain:
- The user request that triggered the action.
- The tool, the exact arguments, and the system that will be touched.
- A plain-English explanation of the effect.
- The agent's confidence (HIGH / MEDIUM / LOW).
- A clear cancel path.
Auto-approval, with limits
For high-volume agents, pure HITL does not scale. Use policy-based auto-approval:
- Below threshold X (dollar amount, blast radius, low-risk action class) → auto-approve, log everything, sample-audit weekly.
- Above threshold X → require human.
- Hard ceiling Y → no automation, ever — only humans.
Document the thresholds. Review them quarterly.
2. Operating rules
Every agent — and every tool — must publish its operating rules. The pattern is "always do / always avoid":
| Always do | Always avoid |
|---|---|
| Use only approved tools listed in the registry. | Invent data, IDs, query results, or tool outputs. |
| Cite the source for any factual claim. | Mix multiple jobs into one role. |
| Ask clarifying questions when required input is missing. | Take an irreversible action without HITL approval. |
| Return a clear "INSUFFICIENT DATA" response when uncertain. | Use credentials beyond least-privilege scope. |
| Log every tool call with arguments and result. | Bypass logging or audit hooks. |
These rules belong in the agent's Operating Rules block (see the Role/Goal/Backstory template) and mirrored as enforced controls in code where possible. Belt and suspenders.
3. Evaluation process
Governance without evaluation is theater. Wire evaluation in before launch and gate every change on it.
The minimum:
- Golden set. 50–100 representative inputs with known-good outputs. (See Hallucination Prevention Protocol for the original concept and Agent Evaluation Framework for the full methodology.)
- Regression run. Re-run the full set on every prompt change, model change, or tool change.
- Scoring rubric. Pass / fail per item with a documented reason for failure.
- Ship gate. Below the documented threshold (commonly 95% on a high-stakes set), the change does not ship.
- Drift watch. Run the eval on a schedule against production traffic samples to catch silent regressions.
4. Observability
You cannot govern what you cannot see. Every agent emits, at minimum:
| Signal | What it captures | Why |
|---|---|---|
| Conversation log | The full prompt, the full response, the tool calls. | Root-cause investigation. |
| Decision log | Which tool was chosen and why, when known. | Detects bad routing patterns. |
| HITL log | Each approval / rejection with approver, timestamp, payload. | Audit and compliance. |
| Cost log | Tokens in / out, dollars per request. | Budget enforcement. |
| Latency log | Per request and per tool call. | UX and SLOs. |
| Eval log | Rolling score on the golden set. | Regression detection. |
Retain logs long enough to satisfy your own compliance posture (often 90 days minimum, longer for regulated industries). Strip PII at write time wherever possible — see Agent Security.
5. Governance launch checklist
Before promoting any agent to production:
- HITL is wired in for every irreversible or external action.
- Auto-approval thresholds are documented and approved.
- Operating rules are written into the agent prompt and enforced in code where possible.
- A golden eval set exists, the score is above threshold, and the suite runs in CI.
- Conversation, decision, HITL, cost, latency, and eval logs are emitting.
- Alert thresholds are set (cost spike, rejection-rate spike, eval drop, latency p99).
- An owner and an on-call exist for the agent.
- An organization-wide AI Policy covers this agent's data class.
Need help implementing or feeling stuck? Contact us today to establish a consulting relationship.