Add to Chrome
✅ The verified answer to this question is available below. Our community-reviewed solutions help you understand the material better.
What does one row of a self-attention matrix represent?
The probabilities of all possible next tokens in the vocabulary.
The importance of each training example for the current prediction.
How strongly one query token attends to each key token when forming its updated representation.
The similarity between the final output embeddings of all words in the vocabulary.
Get Unlimited Answers To Exam Questions - Install Crowdly Extension Now!