Decision rule

Which family fits?

The decision rule, in one page. Five questions to answer before your team starts building. Bookmark this page.

The five questions

  1. Is the input structured (rows + columns) or unstructured (free text, images, audio)?
  2. Do I need a number / class as output, or a sentence?
  3. What is my latency budget? Sub-100ms or seconds?
  4. Do I need to explain the answer to a regulator?
  5. How much labeled training data do I have? Hundreds, thousands, or millions?

Answer those five and the family usually picks itself.

The map

Shape of problemFamilyWhy
Structured + number/class out + sub-100ms + auditableFamily 1: Classical MLTabular data’s home. Gradient boosting wins on accuracy, latency, cost, interpretability.
Image / video / audio / time-series + real-timeFamily 2: Specialist DLPre-LLM specialists are faster and cheaper on their modality.
Free-form text in or out + latency in seconds OKFamily 3: Foundation ModelsOpen shape, generative output, handles novel queries.
Sequential decisions + clear reward + cheap to tryFamily 4: RLThe only family that can act, not just predict.
None of the above. Rules and SQL would doFamily 0: Don’t use AIHalf of “AI projects” should be a SQL query and a dashboard.

Three audit questions for any AI proposal

When your team brings you an AI roadmap item, ask these three before you greenlight:

  1. Which family is this?If they don’t use family vocabulary, that’s a signal they haven’t thought about it.
  2. Show me your evals. No evals, no project. Just a demo.
  3. Show me one production trace.If they can’t, they don’t have observability. They can’t debug. They can’t ship.

Three audit questions for any AI vendor

When a vendor pitches you AI, ask these three before you sign:

  1. Which family powers your product?Many wrap an LLM and call it AI. That’s fine if Family 3 fits the problem shape, expensive if it doesn’t.
  2. What’s your latency at the 95th percentile? Demos show p50. P95 is what your customers feel.
  3. What happens when the model is wrong? A vendor without a hallucination-aware UX has a brittle product.

Common misclassifications

Predict customer churn from customer data using GPT-5.Tabular data → Family 1. XGBoost is orders of magnitude cheaper and more accurate.
Detect manufacturing defects on the assembly line using Claude Vision.Real-time vision → Family 2. YOLO ships 30 FPS on the line camera, fully on-device.
Forecast quarterly revenue using GPT-5 with prompt engineering.Time series → Family 2. Prophet baseline plus TFT or Chronos for lift.
Build a customer-support agent using a fine-tuned LLM.Probably Family 3 with RAG over your help center, not fine-tuning. Test prompting first, RAG second, fine-tune only when both fail.
Monday morning
  1. Audit your current AI roadmap by family. Find the misfits.
  2. Ask the three vendor questions of the next AI vendor that pitches you.
  3. Subscribe to one technical newsletter you will actually read. latent.space, simonwillison.net, and deeplearning.ai/the-batch are the strongest in 2026.

Frequently asked questions

Should I use an LLM to predict customer churn?

No. Customer churn data is tabular (rows of customers with columns of features), and Family 1 classical ML wins on every dimension that matters: accuracy, latency, cost, interpretability. Gradient boosting (XGBoost, LightGBM) is the default winning method on tabular data and is orders of magnitude cheaper than calling an LLM per prediction.

When should I use RAG instead of fine-tuning an LLM?

Use RAG when the model lacks knowledge about your specific data (documents, help center, product catalog). Use fine-tuning when the model has the knowledge but does not behave the way you need (tone, format, refusal patterns). Test prompting first, RAG second, and fine-tune only when both fail. Most teams reach for fine-tuning when they actually need RAG.

Which AI family fits real-time computer vision on a production line?

Family 2, specialist deep learning. Models like YOLO ship at 30 frames per second on a line camera, fully on-device, with sub-100ms latency. Foundation model vision APIs (Claude Vision, GPT-4o) are too slow and too expensive per inference for real-time industrial use cases.

Is reinforcement learning ever the right choice for a startup?

Rarely. Family 4 reinforcement learning fits sequential decisions with a clear reward signal and cheap-to-simulate environments: trading, robotics, recommendation ranking, post-training of LLMs. If your problem is one-shot prediction, classification, or generation, you do not need RL. The infrastructure cost is high and the talent is scarce.

How do I audit an AI vendor's product?

Ask three questions. First, which family powers the product? Many vendors wrap an LLM and call it AI, which is fine if Family 3 fits the problem shape and expensive if it does not. Second, what is the p95 latency under load? Demos show p50; customers feel p95. Third, what happens when the model is wrong? A vendor without a hallucination-aware UX has a brittle product.

What is the production triad every AI team needs?

Eval, trace, loop. Evals define what good looks like before you ship. Traces let you reconstruct what happened when a user reports a bad output. The loop is build, eval, fix, ship, repeat with a tighter eval each time. A team that cannot show you all three is building a demo, not a product.

Mario Deubler

If this matches what your team is hitting

Series A founders and Heads of Product working through these symptoms (teams shipping fast, numbers flat), talk to me. I run as Fractional Head of Product, embedded with your team. Lead and build, not PowerPoint.