Decision rule
Which family fits?
The decision rule, in one page. Five questions to answer before your team starts building. Bookmark this page.
The five questions
- Is the input structured (rows + columns) or unstructured (free text, images, audio)?
- Do I need a number / class as output, or a sentence?
- What is my latency budget? Sub-100ms or seconds?
- Do I need to explain the answer to a regulator?
- How much labeled training data do I have? Hundreds, thousands, or millions?
Answer those five and the family usually picks itself.
The map
| Shape of problem | Family | Why |
|---|---|---|
| Structured + number/class out + sub-100ms + auditable | Family 1: Classical ML | Tabular data’s home. Gradient boosting wins on accuracy, latency, cost, interpretability. |
| Image / video / audio / time-series + real-time | Family 2: Specialist DL | Pre-LLM specialists are faster and cheaper on their modality. |
| Free-form text in or out + latency in seconds OK | Family 3: Foundation Models | Open shape, generative output, handles novel queries. |
| Sequential decisions + clear reward + cheap to try | Family 4: RL | The only family that can act, not just predict. |
| None of the above. Rules and SQL would do | Family 0: Don’t use AI | Half of “AI projects” should be a SQL query and a dashboard. |
Three audit questions for any AI proposal
When your team brings you an AI roadmap item, ask these three before you greenlight:
- Which family is this?If they don’t use family vocabulary, that’s a signal they haven’t thought about it.
- Show me your evals. No evals, no project. Just a demo.
- Show me one production trace.If they can’t, they don’t have observability. They can’t debug. They can’t ship.
Three audit questions for any AI vendor
When a vendor pitches you AI, ask these three before you sign:
- Which family powers your product?Many wrap an LLM and call it AI. That’s fine if Family 3 fits the problem shape, expensive if it doesn’t.
- What’s your latency at the 95th percentile? Demos show p50. P95 is what your customers feel.
- What happens when the model is wrong? A vendor without a hallucination-aware UX has a brittle product.
Common misclassifications
- Audit your current AI roadmap by family. Find the misfits.
- Ask the three vendor questions of the next AI vendor that pitches you.
- Subscribe to one technical newsletter you will actually read. latent.space, simonwillison.net, and deeplearning.ai/the-batch are the strongest in 2026.
Frequently asked questions
Should I use an LLM to predict customer churn?
No. Customer churn data is tabular (rows of customers with columns of features), and Family 1 classical ML wins on every dimension that matters: accuracy, latency, cost, interpretability. Gradient boosting (XGBoost, LightGBM) is the default winning method on tabular data and is orders of magnitude cheaper than calling an LLM per prediction.
When should I use RAG instead of fine-tuning an LLM?
Use RAG when the model lacks knowledge about your specific data (documents, help center, product catalog). Use fine-tuning when the model has the knowledge but does not behave the way you need (tone, format, refusal patterns). Test prompting first, RAG second, and fine-tune only when both fail. Most teams reach for fine-tuning when they actually need RAG.
Which AI family fits real-time computer vision on a production line?
Family 2, specialist deep learning. Models like YOLO ship at 30 frames per second on a line camera, fully on-device, with sub-100ms latency. Foundation model vision APIs (Claude Vision, GPT-4o) are too slow and too expensive per inference for real-time industrial use cases.
Is reinforcement learning ever the right choice for a startup?
Rarely. Family 4 reinforcement learning fits sequential decisions with a clear reward signal and cheap-to-simulate environments: trading, robotics, recommendation ranking, post-training of LLMs. If your problem is one-shot prediction, classification, or generation, you do not need RL. The infrastructure cost is high and the talent is scarce.
How do I audit an AI vendor's product?
Ask three questions. First, which family powers the product? Many vendors wrap an LLM and call it AI, which is fine if Family 3 fits the problem shape and expensive if it does not. Second, what is the p95 latency under load? Demos show p50; customers feel p95. Third, what happens when the model is wrong? A vendor without a hallucination-aware UX has a brittle product.
What is the production triad every AI team needs?
Eval, trace, loop. Evals define what good looks like before you ship. Traces let you reconstruct what happened when a user reports a bad output. The loop is build, eval, fix, ship, repeat with a tighter eval each time. A team that cannot show you all three is building a demo, not a product.
