Add to Chrome
✅ The verified answer to this question is available below. Our community-reviewed solutions help you understand the material better.
In self-attention, what happens after the attention weights have been computed?
The weights replace the original word embeddings permanently.
The weights are used to compute a weighted sum of the value vectors.
The weights are sorted alphabetically.
The weights are directly used as the final predicted token IDs.
Get Unlimited Answers To Exam Questions - Install Crowdly Extension Now!