AI Tool Selection

The AI tooling market is loud, fast-moving, and full of demos that look better than the production reality. This page is the framework we use to cut through it.

The build-vs-buy decision tree

Before evaluating any vendor, decide whether you should be evaluating one at all.

Three honest options:

Buy. Mature SaaS, bounded scope, you customize the surface but not the engine. Fastest path. Best when the workflow is not your moat.
Assemble. Low-code (Zapier / Make) or a thin custom layer over commodity primitives (LLM APIs, MCP servers, vector DBs). The right answer most of the time.
Build. Full custom. Reserve for capabilities that are core to your differentiation or where no tool fits.

When in doubt: buy or assemble first; rebuild only if and when the seams hurt enough.

The vendor evaluation rubric

Score every candidate vendor across these eight dimensions. Use a simple 1–5 scale and weight the dimensions for your context.

Dimension	What you are checking	Disqualifiers
Fit	Does the product solve your problem, not a parallel one?	Demo only ever shows their canonical use case, not yours.
Data residency & use	Where does your data go? Is it used to train?	"We may use your data to improve our models" with no opt-out.
Security posture	SOC 2, ISO, encryption, SSO, audit logs, vulnerability disclosure.	No SOC 2 and no roadmap for one.
Model agnosticism	Can the product use a model you choose, or are you locked to theirs?	A single proprietary model with no swap path.
Exit cost	Can you get your data out? Your prompts? Your eval set?	Closed formats. No export.
Support & maturity	Funding stage, customer count, response times, named contacts.	Pre-revenue with no support SLA.
Integration surface	API, webhooks, MCP, native connectors. Documented?	"Coming soon" on every integration you need.
Pricing model	Per-seat? Per-token? Per-action? Predictable?	Pricing only revealed at the end of a sales cycle.

For an evaluation to be honest, at least one team member must run a real workflow on real data in a sandbox before signing. Vendor-led demos do not count.

The build-or-buy trap to avoid

"We'll just build it ourselves; the API is right there."

Building looks cheap because the visible cost is the API call. The hidden costs:

Auth, rate limiting, retries, idempotency.
Logging, monitoring, alerting.
HITL approval flow.
Eval suite and regression CI.
Security review, threat model, incident response.
The maintenance tail — every model upgrade, every API change, every new browser quirk.

If a tool covers 80% and only the 20% you do not need is in your way, buy and live with the 20%. If a tool covers 80% and the missing 20% is exactly your differentiator, build the 20% and integrate.

RFP question bank

The questions we ask before signing. Adapt to your situation.

Product fit

Walk us through a customer who looked like us. What did they buy and what did they extend?
What does your product not do that customers frequently ask for?
What is the smallest meaningful deployment? Who succeeds at that scale?
Who is the worst-fit customer for you and why?

Data and security

Where is data stored geographically? Can we pin a region?
Is our data used to train your models or any third-party model? Is there an opt-out?
Provide your most recent SOC 2 / ISO report.
How are credentials and secrets handled across your platform?
What is your incident response and breach-notification process?
Do you support SSO (which protocols), SCIM, and audit log export?

Models and lock-in

Which models can your product use? Can we bring our own?
If you change the underlying model, how are we notified and what is our right to test?
Do you support self-hosted or VPC deployment for regulated workloads?

Operations

What is your status page URL, last 90 days of incidents, and your published SLA?
Who is our named technical contact during onboarding? After?
What is the typical response time for a P1 issue?
Show us your roadmap for the next two quarters.

Pricing and exit

What is the all-in monthly cost for our projected usage? Include overages.
What does pricing look like at 5× our current scale?
How do we export our data, prompts, and configuration if we leave?
What is the off-boarding process and timeline?
What is the minimum contract length and the renewal mechanic?

Evaluation

Provide a sandbox or trial against our real data for at least two weeks.
Provide reference customers we can call directly.
What evaluation evidence do you have that your product works? (Their own eval suite, win-rates against benchmarks, anything beyond marketing.)

Red flags

Pricing only revealed at the end of a long sales cycle.
Demos that won't run on your data.
Vague answers on data use ("we take privacy seriously").
A roadmap that perfectly matches your asks. (They are telling you what you want to hear.)
No SOC 2, no compensating controls, and no plan.
"Our model is proprietary." (Translation: opaque, untestable, lock-in.)
A reference list you cannot actually call.

Green flags

Specific, measured answers — including limits.
Willingness to lose the deal if the fit is bad.
Public changelog and incident history.
A free or low-cost tier you can prove value on first.
Documented data export and exit process.
Engineers, not only sales, in the room when you ask hard questions.

Need help implementing or feeling stuck? Contact us today to establish a consulting relationship.

The build-vs-buy decision tree​

The vendor evaluation rubric​

The build-or-buy trap to avoid​

RFP question bank​

Product fit​

Data and security​

Models and lock-in​

Operations​

Pricing and exit​

Evaluation​

Red flags​

Green flags​