Why can the gradient vanish during backpropagation?

Question

Answer

Because a neural network is essentially a nested function and by increasing the number of layers the the gradient at some point becomes too expensive to compute

Answer

Because gradient clipping can cause the training process to diverge

Answer

Because the chain rule of differentiation can cause the gradient to become smaller and smaller when propagating to the earlier layers of the network

Answer

Because the forward pass can lead to model parameters that become so large that gradient flow is not stable anymore

Crowdly

Why can the gradient vanish during backpropagation?

Want instant access to all verified answers on moodle.jku.at?