Family 3 of 4

Foundation models.

The branch everyone talks about. LLMs, VLMs, diffusion models. Built once at massive scale, adapted to your domain afterwards. The most overused family in 2026.

Thesis

Foundation models are general-purpose pretrained models that exhibit emergent capabilities. Trained only to predict the next token, they learned grammar, world knowledge, reasoning patterns, and code. You don’t train them. You adapt them. On the right problem, irreplaceable.

The mental model

The anchor sentenceAn LLM is a next-token predictor. That is the entire model. Chat is an illusion built on top.

Agents, RAG, reasoning models, MCP. They are all systems built around this one mechanism. Hallucinations are not bugs. They are the cost of how the model works. The model is not retrieving facts. It is generating plausible continuations.

The adaptation triangle

Three ways to make a foundation model useful for your domain:

MethodSolvesCost shape
PromptingCommunication problemCheapest, fastest, weakest
RAGKnowledge problem (model doesn’t know your data)Medium; ongoing retrieval cost
Fine-tuningBehavior problem (model doesn’t act as you need)High one-time, lower marginal
None of the aboveYou don’t need an LLM (Family 1, 2, or 4 fits)Best by far

Most teams reach for fine-tuning when they need RAG. Test prompting first. RAG second. Fine-tune only when both fail.

Agents, in one sentence

The anchor sentenceAn agent is a loop. LLM call → tool call → result → repeat, until it decides it’s done.

Anthropic’s December 2024 Building Effective Agents post lists five workflow patterns that recur in production: chaining, routing, parallelization, orchestrator-workers, and evaluator- optimizer. New framework names appear quarterly. The patterns underneath are stable.

Reasoning models

Reasoning models (o-series, DeepSeek-R1, Claude extended thinking, Gemini Thinking) burn tokens internally to reach better answers. Often an order of magnitude more expensive per call than a non-reasoning equivalent. Use them when a smart human would pause and think for several minutes. Skip them when a smart human would answer instantly.

Diffusion (the image and video sub-family)

Generative for images, video, audio, 3D. Frontier 2025-2026: FLUX 2, SD 3.5, Sora, Veo, Kling. ControlNet and IP-Adapter for conditioning. C2PA / SynthID for provenance. Different math family from LLMs (denoising rather than next-token), but same “train once at scale, adapt afterwards” pattern.

MCP

Model Context Protocol. The USB-C of AI tools: the protocol that lets agents touch your CRM, codebase, and database. Released by Anthropic in November 2024, donated to the Linux Foundation in December 2025. By April 2026: ~97 million monthly SDK downloads, ~9,400 public servers, 78% of enterprise teams using it in production. New attack surface. Your CISO needs the OWASP LLM Top 10 by next quarter.

The decision rule

If your problem has...Family 3?
Unstructured text or messy free-form inputYes
Output should be human-readable textYes
Need to handle questions you didn’t predictYes
Latency budget allows secondsYes
Image/video generationYes (diffusion)
Tabular structured-prediction problemNo (Family 1)
Real-time / sub-100ms latencyNo (Family 2)

When NOT to use it

Named exemplars

The common trapFamily 3 is the most expensive family per inference. It is also the most marketed. Audit any “let’s use AI” project for whether the input is actually unstructured text. If it’s tabular, you’re burning money on the wrong family.
Mario Deubler

If this matches what your team is hitting

Series A founders and Heads of Product working through these symptoms (teams shipping fast, numbers flat), talk to me. I run as Fractional Head of Product, embedded with your team. Lead and build, not PowerPoint.