Hell Fast: "如果想看懂这个教程，对词汇的基本要求还是有的很多人会因为对词汇的不理解，而导致学习不顺畅 bbycroft.net/llm"

Home Timeline Search

Hell Fast@hellfast.bsky.social

如果想看懂这个教程，对词汇的基本要求还是有的很多人会因为对词汇的不理解，而导致学习不顺畅 bbycroft.net/llm

bbycroft.net

LLM Visualization

A 3D animated visualization of an LLM with a walkthrough.

June 6, 2025 at 9:50 PM

Hell FastJun 6, 2025

1. discrete token 2. vocabulary size 3. embedding dimension 4. “meaning” model has learned for one token

Hell FastJun 6, 2025

If you have a 48‐dimensional embedding, you have 48 “features” describing each token’s characteristics. 这里可以举例说明，什么是 feature GPT-3 (175 billion-parameter): C=12,288 Embedding 是把 token 的位置信息以及某个语义特征结合起来

Hell FastJun 6, 2025

1. mean (μ) and variance (Var) 2. The epsilon term (ε = 1×10-5) is there to prevent division by zero. 3. 特征修正

Hell FastJun 7, 2025

1. A is the size of the head (what is head) 2. dot product meaning 3. what is Q K V

show 1 reply

Hell FastJun 7, 2025

1. Q means "I need an adjective (type 34 query)" 2. K means "I am a adjective (type 34 query)" 3. KQ means how much relavancy they are in this query 4. softmax(KQ) V is what you need to change the meaning of token, you are an adjective, and now you are red or tall

Hell FastJun 7, 2025

it is still another feature-related thing (ask abstract class) such feature is important or not and add the feature to itself