✅ The verified answer to this question is available below. Our community-reviewed solutions help you understand the material better.
Now work with the LSTM - Simple (custom forget bias) preset.
Train two models:
Note: don't forget to click on 'Apply Changes' when you modify the preset in the architecture editor.
Train for 20 epochs with a learning rate of 0.01.
Examine the development of the loss/accuracy values over the epochs. How does the initial forget gate bias affect the training here?
Keep in mind - here we only set an initial bias, gradient computation is not disabled, thus in both cases the bias parameter will adapt during training.
In your analysis, also consider for both models the gradient magnitude plots. Show the gradients at initialization and after training and include both plots here.
Get Unlimited Answers To Exam Questions - Install Crowdly Extension Now!