Looking for Reinforcement Learning - Fall 2025 test answers and solutions? Browse our comprehensive collection of verified answers for Reinforcement Learning - Fall 2025 at elearning.aua.am.
Get instant access to accurate answers and detailed explanations for your course questions. Our community-driven platform helps students succeed!
What does the "Markov" property imply?
In reinforcement learning, a policy that results in the maximum cumulative reward is called:
Which of the following describes the interaction in the RL loop?
What is the objective of policy iteration in reinforcement learning?
Suppose γ= 0.5 and the following sequence of rewards is received R1 = -1, R2 = 2, R3 = 6, R4 = 3, and R5 = 2, with T = 5. What is the G0?
Hint: Work backward.
Which element in reinforcement learning defines the behavior of the agent?
In an MDP, what defines the probability of moving to a new state given a current state and action?
What is a key assumption behind the Markov decision process (MDP) model?
Our MDP has 3 states: s1, s2, s3. The state transition probabilities are: p11=0, p12=0.4, p13=0.6. When leaving the state s1, the agent receives Rs1=2 reward. The state value function of the states s2 and s3 are: v2=8, v3=4. Calculate the v1 state value of the state s1. The discount factor γ=0.5.
The exploration vs. exploitation trade-off refers to: