Show parent replies
Hell Fast
如果想看懂这个教程,对词汇的基本要求还是有的 很多人会因为对词汇的不理解,而导致学习不顺畅 bbycroft.net/llm
Hell Fast
1. discrete token 2. vocabulary size 3. embedding dimension 4. “meaning” model has learned for one token
Hell Fast
If you have a 48‐dimensional embedding, you have 48 “features” describing each token’s characteristics. 这里可以举例说明,什么是 feature GPT-3 (175 billion-parameter): C=12,288 Embedding 是把 token 的 位置信息 以及 某个语义特征结合起来
Hell Fast
1. mean (μ) and variance (Var) 2. The epsilon term (ε = 1×10-5) is there to prevent division by zero. 3. 特征修正
Hell Fast
1. A is the size of the head (what is head) 2. dot product meaning 3. what is Q K V

Hell Fast
it is still another feature-related thing (ask abstract class) such feature is important or not and add the feature to itself