Show parent replies
Hell Fast
如果想看懂这个教程,对词汇的基本要求还是有的 很多人会因为对词汇的不理解,而导致学习不顺畅 bbycroft.net/llm
Hell Fast
1. discrete token 2. vocabulary size 3. embedding dimension 4. “meaning” model has learned for one token

Hell Fast
1. mean (μ) and variance (Var) 2. The epsilon term (ε = 1×10-5) is there to prevent division by zero. 3. 特征修正
Hell Fast
1. A is the size of the head (what is head) 2. dot product meaning 3. what is Q K V
Hell Fast
1. Q means "I need an adjective (type 34 query)" 2. K means "I am a adjective (type 34 query)" 3. KQ means how much relavancy they are in this query 4. softmax(KQ) V is what you need to change the meaning of token, you are an adjective, and now you are red or tall
Hell Fast
it is still another feature-related thing (ask abstract class) such feature is important or not and add the feature to itself