✅ The verified answer to this question is available below. Our community-reviewed solutions help you understand the material better.
Consider an episodic MDP with one state and two actions (left and right). The left action has stochastic reward 1 with probability p and 3 with probability 1−p. The right action has stochastic reward 0 with probability q and 10 with probability 1−q. What relationship between p and q makes the actions equally optimal?