Crowdly

Add to Chrome

Universities
elearning.aua.am
Reinforcement Learning - Fall 2025

Reinforcement Learning - Fall 2025

Looking for Reinforcement Learning - Fall 2025 test answers and solutions? Browse our comprehensive collection of verified answers for Reinforcement Learning - Fall 2025 at elearning.aua.am.

Get instant access to accurate answers and detailed explanations for your course questions. Our community-driven platform helps students succeed!

(0, 1.75)

(1.25, 0)

(1.75, 1.85)

(1.0, 1.0)

(0, 0)

(0.5, 0)

(1.5, 1.75)

(0, 1.0)

Which of the following is an example of a TD Prediction algorithm?

SARSA

Monte Carlo

ε-greedy TD(0)

Expected SARSA

Q-Learning

TD(0)

View this question

How does Q-Learning differ from SARSA in TD control?

SARSA requires a model of the environment, while Q-Learning does not

Q-Learning updates only at the end of an episode, while SARSA updates at each step

Q-Learning is on-policy, while SARSA is off-policy

SARSA updates the Q-value using the actual action taken, while Q-Learning updates using the maximum action-value

View this question

Which of the following methods updates estimates through bootstrapping? (Select all that apply)

ε-greedy

SARSA

Monte Carlo

Q-Learning

Dynamic Programming

TD(0)

Which of the following is the correct characterization of Dynamic Programming (DP) and Temporal Difference (TD) methods?

TD methods use sample updates, DP methods use expected updates.

✅

TD methods use expected updates, DP methods use sample updates.

❌

Both TD and DP methods use expected updates.

❌

Both TD and DP methods use sample updates.

❌

Both DP and TD require a complete model of the environment’s dynamics.

❌

View this question

Q-learning does not learn about the outcomes of exploratory actions.

In the n-step TD method, what does 'n' represent?

The number of policies being evaluated

The number of future time steps used to compute the return

The number of actions taken in episode

The number of episodes to average over

View this question

In multi-step TD methods, what does the "return" G(t) represent when using n-step bootstrapping?

The sum of rewards from step t to the end of the episode

❌

The current estimated value of the state

❌

The maximum Q-value over all actions

❌

The discounted sum of the next n rewards and the estimated value of the nth state

✅

View this question

Round your answer up to 2 digits.

View this question

Both TD(0) and Monte-Carlo (MC) methods do not converge to the same true value function asymptotically, given that the environment is Markovian.

Want instant access to all verified answers on elearning.aua.am?

Get Unlimited Answers To Exam Questions - Install Crowdly Extension Now!

Add to Chrome

Telegram Instagram TikTok Question Bank

Add to Chrome