logo

Crowdly

Masked attention in a standard GPT allows the word at position N to attend to al...

✅ The verified answer to this question is available below. Our community-reviewed solutions help you understand the material better.

Masked attention in a standard GPT allows the word at position N to attend to all previous words at positions N-1, N-2, etc.

100%
0%
More questions like this

Want instant access to all verified answers on moodle.kent.ac.uk?

Get Unlimited Answers To Exam Questions - Install Crowdly Extension Now!