how LLMs work — in 6 steps:
- Tokenization — Your text becomes numbers. "Transformers changed NLP forever" = 8 tokens. One token ≈ 4 characters.
- Embeddings — Each token ID maps to a dense vector. This is where meaning lives in math.
- Self-Attention — Q, K, V matrices decide which tokens matter to each other. Run it in parallel across 32 heads.
- Token Prediction — Softmax converts raw logits to probabilities. Greedy, Top-p, Beam Search — your decoding strategy changes everything.
- Training — Pretraining on trillions of tokens. Then fine-tuning (SFT → RLHF → DPO) makes it actually useful.
- Production — OpenAI API, Ollama locally, or deploy your own. The pipeline is the same.
