Add to Chrome
✅ The verified answer to this question is available below. Our community-reviewed solutions help you understand the material better.
Why can the gradient vanish during backpropagation?
Because a neural network is essentially a nested function and by increasing the number of layers the the gradient at some point becomes too expensive to compute
Because gradient clipping can cause the training process to diverge
Because the chain rule of differentiation can cause the gradient to become smaller and smaller when propagating to the earlier layers of the network
Because the forward pass can lead to model parameters that become so large that gradient flow is not stable anymore
Get Unlimited Answers To Exam Questions - Install Crowdly Extension Now!