Skip to main content

Agent Lifecycle

A practical map of the six stages every production agent moves through. Each stage has an entry condition, a deliverable, and a checkpoint before you advance.

If a stage's checkpoint fails, go back, do not push forward. Most agent failures in production are stages that were skipped, not stages that were done badly.


Stage 1: Define

Goal: decide whether an agent is the right tool, and if so, what it should do.

ItemWhat it looks like
Problem statementOne sentence: the user, the friction, the desired outcome.
Success metricQuantitative — minutes saved, errors avoided, tickets deflected.
Tool inventoryEvery system the agent will read from or write to.
Risk assessmentWhat is the worst thing this agent could do, and how do we contain it?
Decision: agent vs workflowSee the Agents Overview decision criteria. If a Zapier workflow fits, use a workflow.

Checkpoint to advance: a one-page brief that names the user, the metric, the tools, and the worst-case action.

Stage 2: Build

Goal: stand up the agent and its surrounding system in a sandbox.

ItemReference
Write the instructionsRole, Goal, Backstory for production; AI Agent Configuration for chat-tool agents.
Wire the toolsThrough the registry; typed schemas; idempotency keys for mutations. See Architecture Standards.
Connect the dataVector store, database, or document loader. See RAG Preparation and Data Hygiene.
Add HITL hooksEven in dev. Easier to design in than retrofit. See Governance & HITL.
Stand up loggingConversation, decision, cost, latency from day one.

Checkpoint to advance: the agent completes a happy-path scenario end-to-end on real (non-PII) data.

Stage 3: Evaluate

Goal: prove the agent is good enough to put in front of users.

  • Build the golden set (50–100 representative inputs with known-good outputs).
  • Run the eval. Score per item. Document failures.
  • Ship-gate: meet the documented threshold (commonly 95% on high-stakes work).
  • Run a security review — see Agent Security — for prompt injection, PII handling, and tool-scope creep.

Checkpoint to advance: the eval passes, the security review passes, and any failure modes have a documented mitigation.

Full methodology: Agent Evaluation Framework.

Stage 4: Deploy

Goal: put the agent in front of real users in a controlled way.

TacticWhy
Pilot with a named user groupReal feedback, contained blast radius.
HITL on every mutating actionTrust is earned, not assumed.
Per-agent budget capsCost runaways die at the cap, not on the bill.
Feature flag the agentOne toggle to disable if something breaks.
Document a rollbackThe previous prompt and previous model are one toggle away.

Checkpoint to advance: two consecutive review periods (commonly two weeks) with the success metric trending up and HITL approval rate at or above target.

Stage 5: Monitor

Goal: keep the agent honest in the wild.

What you watch — daily on a dashboard, weekly in a review:

  • Eval score on the golden set (rolling).
  • HITL approval / rejection rate.
  • Tool failure rate and latency p99.
  • Cost per request and total daily spend.
  • User-reported issues.

What you alert on:

  • Eval score drop more than X points.
  • Rejection rate spike beyond Y%.
  • Cost per day above the cap.
  • Tool failure rate above Z%.
  • p99 latency above the SLO.

Stage 6: Iterate

Goal: improve without regressing.

TriggerResponse
Eval score driftInvestigate, patch instructions or tools, re-run eval before re-deploy.
New use case requestedTreat as a new agent (or a new role on an existing one) — start from Define.
Model upgrade availableRun the eval on the new model in staging. Promote only if score holds.
Tool change upstreamRe-run eval; update schemas; alert if the agent's behavior changes materially.
User complaintAdd the failing input to the golden set so the regression cannot return silently.

Rule: every prompt change, model change, and tool change runs through the eval before it ships. No exceptions.


Stage-by-stage cheat sheet

StageEntry conditionExit deliverable
DefineReal problem, named user.One-page brief.
BuildBrief approved.Happy-path demo on real data.
EvaluateDemo working.Eval passing at threshold + security review.
DeployEval and security pass.Pilot with HITL and rollback in place.
MonitorPilot stable.Live dashboard + alerts wired.
IterateProduction traffic flowing.Regression-gated release process.

Need help implementing or feeling stuck? Contact us today to establish a consulting relationship.