Agent Lifecycle
A practical map of the six stages every production agent moves through. Each stage has an entry condition, a deliverable, and a checkpoint before you advance.
If a stage's checkpoint fails, go back, do not push forward. Most agent failures in production are stages that were skipped, not stages that were done badly.
Stage 1: Define
Goal: decide whether an agent is the right tool, and if so, what it should do.
| Item | What it looks like |
|---|---|
| Problem statement | One sentence: the user, the friction, the desired outcome. |
| Success metric | Quantitative — minutes saved, errors avoided, tickets deflected. |
| Tool inventory | Every system the agent will read from or write to. |
| Risk assessment | What is the worst thing this agent could do, and how do we contain it? |
| Decision: agent vs workflow | See the Agents Overview decision criteria. If a Zapier workflow fits, use a workflow. |
Checkpoint to advance: a one-page brief that names the user, the metric, the tools, and the worst-case action.
Stage 2: Build
Goal: stand up the agent and its surrounding system in a sandbox.
| Item | Reference |
|---|---|
| Write the instructions | Role, Goal, Backstory for production; AI Agent Configuration for chat-tool agents. |
| Wire the tools | Through the registry; typed schemas; idempotency keys for mutations. See Architecture Standards. |
| Connect the data | Vector store, database, or document loader. See RAG Preparation and Data Hygiene. |
| Add HITL hooks | Even in dev. Easier to design in than retrofit. See Governance & HITL. |
| Stand up logging | Conversation, decision, cost, latency from day one. |
Checkpoint to advance: the agent completes a happy-path scenario end-to-end on real (non-PII) data.
Stage 3: Evaluate
Goal: prove the agent is good enough to put in front of users.
- Build the golden set (50–100 representative inputs with known-good outputs).
- Run the eval. Score per item. Document failures.
- Ship-gate: meet the documented threshold (commonly 95% on high-stakes work).
- Run a security review — see Agent Security — for prompt injection, PII handling, and tool-scope creep.
Checkpoint to advance: the eval passes, the security review passes, and any failure modes have a documented mitigation.
Full methodology: Agent Evaluation Framework.
Stage 4: Deploy
Goal: put the agent in front of real users in a controlled way.
| Tactic | Why |
|---|---|
| Pilot with a named user group | Real feedback, contained blast radius. |
| HITL on every mutating action | Trust is earned, not assumed. |
| Per-agent budget caps | Cost runaways die at the cap, not on the bill. |
| Feature flag the agent | One toggle to disable if something breaks. |
| Document a rollback | The previous prompt and previous model are one toggle away. |
Checkpoint to advance: two consecutive review periods (commonly two weeks) with the success metric trending up and HITL approval rate at or above target.
Stage 5: Monitor
Goal: keep the agent honest in the wild.
What you watch — daily on a dashboard, weekly in a review:
- Eval score on the golden set (rolling).
- HITL approval / rejection rate.
- Tool failure rate and latency p99.
- Cost per request and total daily spend.
- User-reported issues.
What you alert on:
- Eval score drop more than X points.
- Rejection rate spike beyond Y%.
- Cost per day above the cap.
- Tool failure rate above Z%.
- p99 latency above the SLO.
Stage 6: Iterate
Goal: improve without regressing.
| Trigger | Response |
|---|---|
| Eval score drift | Investigate, patch instructions or tools, re-run eval before re-deploy. |
| New use case requested | Treat as a new agent (or a new role on an existing one) — start from Define. |
| Model upgrade available | Run the eval on the new model in staging. Promote only if score holds. |
| Tool change upstream | Re-run eval; update schemas; alert if the agent's behavior changes materially. |
| User complaint | Add the failing input to the golden set so the regression cannot return silently. |
Rule: every prompt change, model change, and tool change runs through the eval before it ships. No exceptions.
Stage-by-stage cheat sheet
| Stage | Entry condition | Exit deliverable |
|---|---|---|
| Define | Real problem, named user. | One-page brief. |
| Build | Brief approved. | Happy-path demo on real data. |
| Evaluate | Demo working. | Eval passing at threshold + security review. |
| Deploy | Eval and security pass. | Pilot with HITL and rollback in place. |
| Monitor | Pilot stable. | Live dashboard + alerts wired. |
| Iterate | Production traffic flowing. | Regression-gated release process. |
Need help implementing or feeling stuck? Contact us today to establish a consulting relationship.