Family 3 of 4

Foundation models.

The branch everyone talks about. LLMs, VLMs, diffusion models. Built once at massive scale, adapted to your domain afterwards. The most overused family in 2026.

Thesis

Foundation models are general-purpose pretrained models that exhibit emergent capabilities. Trained only to predict the next token, they learned grammar, world knowledge, reasoning patterns, and code. You don’t train them. You adapt them. On the right problem, irreplaceable.

The mental model

The anchor sentenceAn LLM is a next-token predictor. That is the entire model. Chat is an illusion built on top.

Agents, RAG, reasoning models, MCP. They are all systems built around this one mechanism. Hallucinations are not bugs. They are the cost of how the model works. The model is not retrieving facts. It is generating plausible continuations.

The adaptation triangle

Three ways to make a foundation model useful for your domain:

Method	Solves	Cost shape
Prompting	Communication problem	Cheapest, fastest, weakest
RAG	Knowledge problem (model doesn’t know your data)	Medium; ongoing retrieval cost
Fine-tuning	Behavior problem (model doesn’t act as you need)	High one-time, lower marginal
None of the above	You don’t need an LLM (Family 1, 2, or 4 fits)	Best by far

Most teams reach for fine-tuning when they need RAG. Test prompting first. RAG second. Fine-tune only when both fail.

Agents, in one sentence

The anchor sentenceAn agent is a loop. LLM call → tool call → result → repeat, until it decides it’s done.

Anthropic’s December 2024 Building Effective Agents post lists five workflow patterns that recur in production: chaining, routing, parallelization, orchestrator-workers, and evaluator- optimizer. New framework names appear quarterly. The patterns underneath are stable.

Reasoning models

Reasoning models (o-series, DeepSeek-R1, Claude extended thinking, Gemini Thinking) burn tokens internally to reach better answers. Often an order of magnitude more expensive per call than a non-reasoning equivalent. Use them when a smart human would pause and think for several minutes. Skip them when a smart human would answer instantly.

Diffusion (the image and video sub-family)

Generative for images, video, audio, 3D. Frontier 2025-2026: FLUX 2, SD 3.5, Sora, Veo, Kling. ControlNet and IP-Adapter for conditioning. C2PA / SynthID for provenance. Different math family from LLMs (denoising rather than next-token), but same “train once at scale, adapt afterwards” pattern.

MCP

Model Context Protocol. The USB-C of AI tools: the protocol that lets agents touch your CRM, codebase, and database. Released by Anthropic in November 2024, donated to the Linux Foundation in December 2025. By April 2026: ~97 million monthly SDK downloads, ~9,400 public servers, 78% of enterprise teams using it in production. New attack surface. Your CISO needs the OWASP LLM Top 10 by next quarter.

The decision rule

If your problem has...	Family 3?
Unstructured text or messy free-form input	Yes
Output should be human-readable text	Yes
Need to handle questions you didn’t predict	Yes
Latency budget allows seconds	Yes
Image/video generation	Yes (diffusion)
Tabular structured-prediction problem	No (Family 1)
Real-time / sub-100ms latency	No (Family 2)

When NOT to use it

Tabular prediction. Family 1 wins on every dimension.
Real-time inference. Even the fastest LLM call is 200ms+.
Strictly auditable individual decisions. EU AI Act high-risk obligations and GDPR Article 22 push regulated systems toward explainable models; mech-interp on LLMs is still a research project.
High-volume, low-margin tasks. €0.001 per call × 10M calls per day = €3.65M/year for what could have been a Family 1 model.

Named exemplars

Consumer chat. ChatGPT, Claude, Gemini.
Code copilots. GitHub Copilot, Cursor.
Deep research products. Perplexity, Anthropic Research, OpenAI Deep Research.
Image generation. Midjourney, FLUX, Stable Diffusion 3.5.
Voice agents. Vapi, Retell. Family 2 + Family 3 pipeline.

The common trapFamily 3 is the most expensive family per inference. It is also the most marketed. Audit any “let’s use AI” project for whether the input is actually unstructured text. If it’s tabular, you’re burning money on the wrong family.