Family 1 of 4
Classical ML.
If your business has a table, this is your AI. The largest and quietest family in production.
Thesis
Classical machine learning is the workhorse of production AI in 2026. Linear and logistic regression, decision trees, random forests, gradient boosting. On tabular data, it beats deep learning and beats large language models on every dimension that matters in production: accuracy, latency, cost, interpretability, auditability.
How it works, in one paragraph
You have 100,000 historical examples. Each row is a customer, a transaction, a sensor reading. One column is the label: whether the customer churned, whether the transaction was fraud. The model learns the pattern that connects the input columns to the label column. At prediction time, you hand it a new row and it returns a number or a class. That is the entire mechanism.
The three methods that cover 95%
- Linear and logistic regression. Fast, interpretable, the right baseline. Always start here.
- Decision trees and random forests.The model asks yes/no questions in sequence. A “forest” votes across hundreds of trees. Robust and explainable.
- Gradient boosting (XGBoost, LightGBM, CatBoost). Trees built one at a time, each fixing the last one’s mistakes. The default winning method on tabular data since ~2014. This is what your data scientist actually uses.
The decision rule
| If your problem has... | Family 1? |
|---|---|
| Tabular data (rows + columns) | Yes |
| Sub-100 ms latency budget | Yes |
| Regulatory or auditing requirement | Yes |
| Output is a number or a class | Yes |
| Less than ~1M training examples | Yes |
| Output is free-form text | No (Family 3) |
| Input is messy unstructured text or images | No (Family 2 or 3) |
When NOT to use it
Family 1 cannot generate text. It cannot caption an image. It cannot answer a question it has not been specifically trained for. If your problem is communication-shaped (a chatbot, a summary, a report), you are looking at Family 3.
Family 1 also cannot extrapolate beyond its training distribution. If you train on 2020-2024 data and the world changes, the model degrades silently. Detect drift via monitoring; retrain quarterly.
The hidden superpower: feature engineering
In Family 1, the data scientist’s real job is not picking the model. It is engineering the right input features. Days since last purchase, average order value over 30 days, ratio of returns to orders. These derived features are where the accuracy comes from. A junior who can run XGBoost is plentiful. A senior who knows what features to build is rare and worth five times the rate.
Named exemplars
- Bank fraud detection. Gradient boosting on engineered features, sub-50ms decisions, GDPR-explainable via SHAP.
- B2B churn prediction. Logistic regression baseline plus XGBoost lift. See the four-families-of-ai notebook for a live runthrough.
- Insurance pricing. Generalized linear models, regulator-mandated interpretability.
- Manufacturing defect detection. Gradient boosting on sensor telemetry.
