how LLMs work — in 6 steps:

  1. Tokenization — Your text becomes numbers. "Transformers changed NLP forever" = 8 tokens. One token ≈ 4 characters.
  2. Embeddings — Each token ID maps to a dense vector. This is where meaning lives in math.
  3. Self-Attention — Q, K, V matrices decide which tokens matter to each other. Run it in parallel across 32 heads.
  4. Token Prediction — Softmax converts raw logits to probabilities. Greedy, Top-p, Beam Search — your decoding strategy changes everything.
  5. Training — Pretraining on trillions of tokens. Then fine-tuning (SFT → RLHF → DPO) makes it actually useful.
  6. Production — OpenAI API, Ollama locally, or deploy your own. The pipeline is the same.

最后修改:2026 年 05 月 30 日
如果觉得我的文章对你有用,请随意赞赏