✅ The verified answer to this question is available below. Our community-reviewed solutions help you understand the material better.
The Q-learning update equation (given below) returns a new value for the current state using the current reward and the value of the best action in the next state.
Get Unlimited Answers To Exam Questions - Install Crowdly Extension Now!