Crowdly

Додати до Chrome

Університети
elearning.aua.am
Reinforcement Learning - Fall 2025

Reinforcement Learning - Fall 2025

Шукаєте відповіді та рішення тестів для Reinforcement Learning - Fall 2025? Перегляньте нашу велику колекцію перевірених відповідей для Reinforcement Learning - Fall 2025 в elearning.aua.am.

Отримайте миттєвий доступ до точних відповідей та детальних пояснень для питань вашого курсу. Наша платформа, створена спільнотою, допомагає студентам досягати успіху!

Переглянути це питання

Match the algorithm name to its correct update rule (select all that apply)

Переглянути це питання

Which of the following well-describe Temporal Difference (TD) and Monte-Carlo (MC) methods? (Select all that apply)

MC methods can be used in episodic tasks.

TD methods can be used in continuing tasks.

Both TD and MC methods require full knowledge of the environment’s transition model.

TD methods update value estimates before the episode ends, while MC methods update only after the episode terminates.

MC methods can be used in continuing tasks.

TD methods can be used in episodic tasks.

Переглянути це питання

What is the target policy in Q-learning?

Random

Greedy with respect to the current action-value estimates

None of the answers is correct

ϵ-greedy with respect to the current action-value estimates

Переглянути це питання

In an episodic setting, we might have different updates depending on whether the next state is terminal or non-terminal. Which of the following TD error calculations are correct? (Select all that apply)

Переглянути це питання

Q-Learning needs to wait until the end of an episode before performing its update.

Q-Learning can not be applied to the episodic task

True

False

It depends on the task

Переглянути це питання

When using the Q-Learning update rule, how is the next action-value estimate determined?

By following the current policy

By sampling the next action randomly

By averaging the Q-values of all possible actions

By selecting the action with the maximum estimated Q-value

Переглянути це питання

Which parameter is used to control the balance between single-step and multi-step bootstrapping methods?

Exploration rate ϵ

Discount factor γ

Eligibility trace decay parameter λ

Learning rate α

Переглянути це питання

Sarsa, Q-learning, and Expected Sarsa have different targets on a transition to a terminal state.

Переглянути це питання

Which of the following pairs is the correct characterization of the TD(0) and Monte-Carlo (MC) methods?

TD(0) is an online method while MC is an offline method.

Both TD(0) and MC are offline methods.

Both TD(0) and MC are online methods.

MC is an online method while TD(0) is an offline method.

Переглянути це питання

1
2
3
4
5
6
Наступна

Хочете миттєвий доступ до всіх перевірених відповідей на elearning.aua.am?

Отримайте необмежений доступ до відповідей на екзаменаційні питання - встановіть розширення Crowdly зараз!

Додати до Chrome

Telegram Instagram TikTok Question Bank

Умови використання Зв'яжіться з нами

Додати до Chrome

Додати до Chrome