Crowdly

Add to Chrome

Universities
elearning.aua.am
Reinforcement Learning - Fall 2025

Reinforcement Learning - Fall 2025

Looking for Reinforcement Learning - Fall 2025 test answers and solutions? Browse our comprehensive collection of verified answers for Reinforcement Learning - Fall 2025 at elearning.aua.am.

Get instant access to accurate answers and detailed explanations for your course questions. Our community-driven platform helps students succeed!

View this question

Match the algorithm name to its correct update rule (select all that apply)

View this question

Which of the following well-describe Temporal Difference (TD) and Monte-Carlo (MC) methods? (Select all that apply)

MC methods can be used in episodic tasks.

TD methods can be used in continuing tasks.

Both TD and MC methods require full knowledge of the environment’s transition model.

TD methods update value estimates before the episode ends, while MC methods update only after the episode terminates.

MC methods can be used in continuing tasks.

TD methods can be used in episodic tasks.

View this question

What is the target policy in Q-learning?

Random

Greedy with respect to the current action-value estimates

None of the answers is correct

ϵ-greedy with respect to the current action-value estimates

View this question

In an episodic setting, we might have different updates depending on whether the next state is terminal or non-terminal. Which of the following TD error calculations are correct? (Select all that apply)

View this question

Q-Learning needs to wait until the end of an episode before performing its update.

Q-Learning can not be applied to the episodic task

True

False

It depends on the task

View this question

When using the Q-Learning update rule, how is the next action-value estimate determined?

By following the current policy

By sampling the next action randomly

By averaging the Q-values of all possible actions

By selecting the action with the maximum estimated Q-value

View this question

Which parameter is used to control the balance between single-step and multi-step bootstrapping methods?

Exploration rate ϵ

Discount factor γ

Eligibility trace decay parameter λ

Learning rate α

View this question

Sarsa, Q-learning, and Expected Sarsa have different targets on a transition to a terminal state.

View this question

Which of the following pairs is the correct characterization of the TD(0) and Monte-Carlo (MC) methods?

TD(0) is an online method while MC is an offline method.

Both TD(0) and MC are offline methods.

Both TD(0) and MC are online methods.

MC is an online method while TD(0) is an offline method.

View this question

1
2
3
4
5
6
Next

Want instant access to all verified answers on elearning.aua.am?

Get Unlimited Answers To Exam Questions - Install Crowdly Extension Now!

Add to Chrome

Telegram Instagram TikTok Question Bank

Terms of Use Contact Us

Add to Chrome

Add to Chrome