Add to Chrome
✅ The verified answer to this question is available below. Our community-reviewed solutions help you understand the material better.
How does Q-Learning differ from SARSA in TD control?
SARSA requires a model of the environment, while Q-Learning does not
Q-Learning updates only at the end of an episode, while SARSA updates at each step
Q-Learning is on-policy, while SARSA is off-policy
SARSA updates the Q-value using the actual action taken, while Q-Learning updates using the maximum action-value
Get Unlimited Answers To Exam Questions - Install Crowdly Extension Now!