What is model distillation?
A plain-language primer on knowledge distillation — how a small student model learns to think like a giant teacher, and why it's the key to running AI on your own hardware.
A living curriculum on the craft of shrinking intelligence. Read it top to bottom, or drop into any level. We update it as the field moves.
Response, feature, and relation-based distillation — plus self, online, and offline variants. The conceptual map of how knowledge actually moves from teacher to student.
Three different ways to shrink a model — knowledge transfer, precision reduction, and removal — what each one actually changes, and how to stack them into one local-ready pipeline.
Every term a newcomer to model distillation needs — soft labels, dark knowledge, reverse KL, GGUF, on-policy distillation, the capacity gap, and more — each in one sentence.
The frameworks people actually use to distill models in 2026 — from Hugging Face TRL and Arcee DistillKit to synthetic-data pipelines and managed cloud services.
Put a distilled model on your own machine — Ollama, llama.cpp, MLX, and LM Studio, plus how to read GGUF quant names and pick the right one for your hardware.
How chain-of-thought traces turned distillation from a compression trick into a way to transfer reasoning itself — the DeepSeek-R1 recipe and why it changed the field.
Why letting the student generate its own attempts and having the teacher grade them — rather than imitating fixed teacher data — became the dominant post-training paradigm of 2025–2026.