Crowdly

Додати до Chrome

Університети
elearning.aua.am
Reinforcement Learning - Fall 2025

Reinforcement Learning - Fall 2025

Шукаєте відповіді та рішення тестів для Reinforcement Learning - Fall 2025? Перегляньте нашу велику колекцію перевірених відповідей для Reinforcement Learning - Fall 2025 в elearning.aua.am.

Отримайте миттєвий доступ до точних відповідей та детальних пояснень для питань вашого курсу. Наша платформа, створена спільнотою, допомагає студентам досягати успіху!

(0, 1.75)

(1.25, 0)

(1.75, 1.85)

(1.0, 1.0)

(0, 0)

(0.5, 0)

(1.5, 1.75)

(0, 1.0)

Переглянути це питання

Which of the following is an example of a TD Prediction algorithm?

SARSA

Monte Carlo

ε-greedy TD(0)

Expected SARSA

Q-Learning

TD(0)

Переглянути це питання

How does Q-Learning differ from SARSA in TD control?

SARSA requires a model of the environment, while Q-Learning does not

Q-Learning updates only at the end of an episode, while SARSA updates at each step

Q-Learning is on-policy, while SARSA is off-policy

SARSA updates the Q-value using the actual action taken, while Q-Learning updates using the maximum action-value

Переглянути це питання

Which of the following methods updates estimates through bootstrapping? (Select all that apply)

ε-greedy

SARSA

Monte Carlo

Q-Learning

Dynamic Programming

TD(0)

Переглянути це питання

Which of the following is the correct characterization of Dynamic Programming (DP) and Temporal Difference (TD) methods?

TD methods use sample updates, DP methods use expected updates.

✅

TD methods use expected updates, DP methods use sample updates.

❌

Both TD and DP methods use expected updates.

❌

Both TD and DP methods use sample updates.

❌

Both DP and TD require a complete model of the environment’s dynamics.

❌

Переглянути це питання

Q-learning does not learn about the outcomes of exploratory actions.

True

100%

False

Переглянути це питання

In the n-step TD method, what does 'n' represent?

The number of policies being evaluated

The number of future time steps used to compute the return

The number of actions taken in episode

The number of episodes to average over

Переглянути це питання

In multi-step TD methods, what does the "return" G(t) represent when using n-step bootstrapping?

The sum of rewards from step t to the end of the episode

❌

The current estimated value of the state

❌

The maximum Q-value over all actions

❌

The discounted sum of the next n rewards and the estimated value of the nth state

✅

Переглянути це питання

Round your answer up to 2 digits.

Переглянути це питання

Both TD(0) and Monte-Carlo (MC) methods do not converge to the same true value function asymptotically, given that the environment is Markovian.

True

False

100%

Переглянути це питання

Попередня
1
2
3
4
5
6
Наступна

Хочете миттєвий доступ до всіх перевірених відповідей на elearning.aua.am?

Отримайте необмежений доступ до відповідей на екзаменаційні питання - встановіть розширення Crowdly зараз!

Додати до Chrome

Telegram Instagram TikTok Question Bank

Умови використання Зв'яжіться з нами

Додати до Chrome