Show parent replies
Hell Fast
如果想看懂这个教程,对词汇的基本要求还是有的 很多人会因为对词汇的不理解,而导致学习不顺畅 bbycroft.net/llm
Hell Fast
1. discrete token 2. vocabulary size 3. embedding dimension 4. “meaning” model has learned for one token
Hell Fast
If you have a 48‐dimensional embedding, you have 48 “features” describing each token’s characteristics. 这里可以举例说明,什么是 feature GPT-3 (175 billion-parameter): C=12,288 Embedding 是把 token 的 位置信息 以及 某个语义特征结合起来
Hell Fast
1. mean (μ) and variance (Var) 2. The epsilon term (ε = 1×10-5) is there to prevent division by zero. 3. 特征修正
Hell Fast
1. A is the size of the head (what is head) 2. dot product meaning 3. what is Q K V
Hell Fast
1. Q means "I need an adjective (type 34 query)" 2. K means "I am a adjective (type 34 query)" 3. KQ means how much relavancy they are in this query 4. softmax(KQ) V is what you need to change the meaning of token, you are an adjective, and now you are red or tall