Referenz

Glossar.

Die 30 Begriffe, denen Führungskräfte im Jahr 2026 begegnen werden – erklärt für die Entscheidungsfindung, nicht für die Technik. Jeweils ein Absatz.

Agent: An LLM in a loop, calling tools and acting on results until it decides it's done. Anthropic's December 2024 'Building Effective Agents' names five recurring patterns: chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer.
AI engineer: Engineer who builds applications on top of foundation models. Distinct from ML engineer (trains custom models) and research scientist (advances the field). The role with the steepest demand growth in 2026.
Calibration: How well a model's stated confidence matches its actual accuracy. A well-calibrated model that says it's 90% confident is right 90% of the time. Most LLMs are poorly calibrated; their confidence is unreliable.
Diffusion model: Generative model trained by progressively denoising random noise into a coherent sample. Family 3 sub-family. Powers image, video, and audio generation: FLUX, Sora, Veo.
Embedding: A vector representation of text or other content. Similar content has similar vectors. Embeddings power search, retrieval (RAG), recommendations, and semantic matching.
Eval / evaluation set: A curated set of inputs and expected outputs (or rubrics) used to measure model quality. Without evals, every change is blind. Production AI without evals is impossible.
Fine-tuning: Adapting a pretrained model to your domain by continuing to train it on domain-specific examples. Solves behavior problems (model doesn't act as you need). Modern variant: LoRA, which fine-tunes a small adapter and leaves the base model frozen.
Foundation model: A pretrained model adapted afterwards rather than trained from scratch. LLMs, VLMs, diffusion models. Family 3.
Function calling / tool use: An LLM emits structured calls to external functions (search, database, API) and uses the results to continue. The mechanism that makes agents possible.
Hallucination: An LLM generates fluent but factually wrong content. Not a bug. A consequence of the model being a generator, not a retriever. Mitigated via RAG, refusal training, hallucination-aware UX. Not eliminated.
Inference: Running a trained model to get a prediction. Distinct from training. Most production cost in modern AI is inference, not training.
Latency budget: How long a user can wait for an AI response. Real-time UX: <100ms. Conversational: 1-3s. Batch: minutes. Family choice depends on this.
LLM (Large Language Model): A neural network trained to predict the next token over enormous text corpora. Through scale, exhibits emergent capabilities: grammar, world knowledge, reasoning. Family 3 core.
MCP (Model Context Protocol): Open protocol for connecting LLMs to tools and data sources. Released by Anthropic Nov 2024, Linux Foundation Dec 2025. Becoming the USB-C of AI. New attack surface; treat MCP servers as untrusted by default.
Mech interp / mechanistic interpretability: Research field aiming to reverse-engineer what's happening inside LLMs at the neuron level. SAEs, circuits, refusal direction. Promising but not yet a deployable interpretability solution. MIT Technology Review named it a 2026 breakthrough.
MoE (Mixture of Experts): Architecture where only a subset of model parameters activates per token. Lets models scale total parameters without proportionally scaling per-token cost. Used in Mixtral, DeepSeek-V3, Qwen-MoE, and reportedly several frontier closed models.
OWASP LLM Top 10: Industry-standard list of the top security risks for LLM applications. Did not exist before 2023. Reading list for any CISO whose company is deploying LLMs.
Prompt injection: Attack where untrusted input causes an LLM to take unintended actions. Indirect prompt injection (instructions hidden inside a webpage or document the agent reads) is the dominant variant. Cannot be fully prevented; must be designed around.
RAG (Retrieval-Augmented Generation): An LLM is given relevant documents to ground its answer. Solves the knowledge problem (model doesn't know your data). Most production LLM applications use RAG.
Reasoning model: An LLM that burns extra tokens 'thinking' before answering. o-series, DeepSeek-R1, Claude extended thinking, Gemini Thinking. Often an order of magnitude more expensive per call. Worth it for math, code, multi-step planning. Wasteful for simple queries.
RL / reinforcement learning: Training a model to take actions in an environment to maximize cumulative reward. Family 4. Underlies AlphaGo, robotics, RLHF.
RLHF / RLVR: RL from Human Feedback / RL from Verifiable Rewards. The post-training techniques that make LLMs feel helpful. RLVR (verifiable rewards) is what trains modern reasoning models.
SHAP / LIME: Methods for explaining individual classical-ML predictions. Show which input features pushed the prediction in which direction. The standard tool when regulated decisions need a defensible explanation. Don't work well on LLMs.
Speculative decoding: Inference optimization. A small fast model drafts tokens, the big model verifies them in parallel. Speeds up inference 2-3x without quality loss.
Synthetic data: Training data generated by another model. Powers modern post-training. Risk: model collapse if a model is trained on its own outputs at scale.
Tabular data: Data shaped as rows and columns. Customers, transactions, sensors. The largest single class of data in production. Family 1's home turf.
Token: The unit an LLM perceives. Roughly 0.7 words on average. Models 'see' text as sequences of token IDs. Pricing is per token. Many LLM bugs trace to tokenization edge cases.
Two-tower retrieval: Recommendation/search architecture. One tower embeds the query, another embeds the item. Lookup is a fast nearest-neighbour search. Returns top items in single-digit milliseconds across 10M+ catalogs. Not what LLMs are built for.
Vector database: Database optimized for similarity search over embeddings. pgvector, Pinecone, Weaviate, Qdrant, FAISS. The retrieval substrate of most RAG systems.
VLM (Vision-Language Model): A multimodal foundation model that takes images and text as input. GPT-4o, Claude with vision, Gemini. Family 3. Often outperformed by Family 2 specialists on real-time vision tasks.