Agent Lifecycle

A practical map of the six stages every production agent moves through. Each stage has an entry condition, a deliverable, and a checkpoint before you advance.

If a stage's checkpoint fails, go back, do not push forward. Most agent failures in production are stages that were skipped, not stages that were done badly.

Stage 1: Define

Goal: decide whether an agent is the right tool, and if so, what it should do.

Item	What it looks like
Problem statement	One sentence: the user, the friction, the desired outcome.
Success metric	Quantitative — minutes saved, errors avoided, tickets deflected.
Tool inventory	Every system the agent will read from or write to.
Risk assessment	What is the worst thing this agent could do, and how do we contain it?
Decision: agent vs workflow	See the Agents Overview decision criteria. If a Zapier workflow fits, use a workflow.

Checkpoint to advance: a one-page brief that names the user, the metric, the tools, and the worst-case action.

Stage 2: Build

Goal: stand up the agent and its surrounding system in a sandbox.

Item	Reference
Write the instructions	Role, Goal, Backstory for production; AI Agent Configuration for chat-tool agents.
Wire the tools	Through the registry; typed schemas; idempotency keys for mutations. See Architecture Standards.
Connect the data	Vector store, database, or document loader. See RAG Preparation and Data Hygiene.
Add HITL hooks	Even in dev. Easier to design in than retrofit. See Governance & HITL.
Stand up logging	Conversation, decision, cost, latency from day one.

Checkpoint to advance: the agent completes a happy-path scenario end-to-end on real (non-PII) data.

Stage 3: Evaluate

Goal: prove the agent is good enough to put in front of users.

Build the golden set (50–100 representative inputs with known-good outputs).
Run the eval. Score per item. Document failures.
Ship-gate: meet the documented threshold (commonly 95% on high-stakes work).
Run a security review — see Agent Security — for prompt injection, PII handling, and tool-scope creep.

Checkpoint to advance: the eval passes, the security review passes, and any failure modes have a documented mitigation.

Full methodology: Agent Evaluation Framework.

Stage 4: Deploy

Goal: put the agent in front of real users in a controlled way.

Tactic	Why
Pilot with a named user group	Real feedback, contained blast radius.
HITL on every mutating action	Trust is earned, not assumed.
Per-agent budget caps	Cost runaways die at the cap, not on the bill.
Feature flag the agent	One toggle to disable if something breaks.
Document a rollback	The previous prompt and previous model are one toggle away.

Checkpoint to advance: two consecutive review periods (commonly two weeks) with the success metric trending up and HITL approval rate at or above target.

Stage 5: Monitor

Goal: keep the agent honest in the wild.

What you watch — daily on a dashboard, weekly in a review:

Eval score on the golden set (rolling).
HITL approval / rejection rate.
Tool failure rate and latency p99.
Cost per request and total daily spend.
User-reported issues.

What you alert on:

Eval score drop more than X points.
Rejection rate spike beyond Y%.
Cost per day above the cap.
Tool failure rate above Z%.
p99 latency above the SLO.

Stage 6: Iterate

Goal: improve without regressing.

Trigger	Response
Eval score drift	Investigate, patch instructions or tools, re-run eval before re-deploy.
New use case requested	Treat as a new agent (or a new role on an existing one) — start from Define.
Model upgrade available	Run the eval on the new model in staging. Promote only if score holds.
Tool change upstream	Re-run eval; update schemas; alert if the agent's behavior changes materially.
User complaint	Add the failing input to the golden set so the regression cannot return silently.

Rule: every prompt change, model change, and tool change runs through the eval before it ships. No exceptions.

Stage-by-stage cheat sheet

Stage	Entry condition	Exit deliverable
Define	Real problem, named user.	One-page brief.
Build	Brief approved.	Happy-path demo on real data.
Evaluate	Demo working.	Eval passing at threshold + security review.
Deploy	Eval and security pass.	Pilot with HITL and rollback in place.
Monitor	Pilot stable.	Live dashboard + alerts wired.
Iterate	Production traffic flowing.	Regression-gated release process.

Need help implementing or feeling stuck? Contact us today to establish a consulting relationship.

Stage 1: Define​

Stage 2: Build​

Stage 3: Evaluate​

Stage 4: Deploy​

Stage 5: Monitor​

Stage 6: Iterate​

Stage-by-stage cheat sheet​