Crowdly

Додати до Chrome

Університети
elearning.aua.am
Reinforcement Learning - Fall 2025

Reinforcement Learning - Fall 2025

Шукаєте відповіді та рішення тестів для Reinforcement Learning - Fall 2025? Перегляньте нашу велику колекцію перевірених відповідей для Reinforcement Learning - Fall 2025 в elearning.aua.am.

Отримайте миттєвий доступ до точних відповідей та детальних пояснень для питань вашого курсу. Наша платформа, створена спільнотою, допомагає студентам досягати успіху!

What does the "Markov" property imply?

The current state encapsulates all necessary information about the past

Future states depend on past actions

Rewards depend on future decisions

Actions are random and uncorrelated

Переглянути це питання

In reinforcement learning, a policy that results in the maximum cumulative reward is called:

The stochastic policy

The optimal policy

The greedy policy

The deterministic policy

Переглянути це питання

Which of the following describes the interaction in the RL loop?

The agent chooses rewards and the environment selects actions

The environment chooses the next state and action

The agent interacts with the environment by taking actions and receiving rewards

The agent interacts with itself to optimize actions

Переглянути це питання

What is the objective of policy iteration in reinforcement learning?

To find an optimal value function

To balance exploration and exploitation

To find the best action for each state

To find an optimal policy

Переглянути це питання

Suppose γ= 0.5 and the following sequence of rewards is received R₁ = -1, R₂ = 2, R₃ = 6, R₄ = 3, and R₅ = 2, with T = 5. What is the G₀?

Hint: Work backward.

Переглянути це питання

Which element in reinforcement learning defines the behavior of the agent?

Model of the environment

Value Function

Policy

Reward Signal

Переглянути це питання

In an MDP, what defines the probability of moving to a new state given a current state and action?

Reward function

Policy

Transition probability

State-value function

Переглянути це питання

What is a key assumption behind the Markov decision process (MDP) model?

The agent's future state depends on accumulated rewards

The agent has perfect knowledge of future states

The agent's future state depends on the history of all previous states and actions

The agent's future state depends only on its current state and action

Переглянути це питання

Our MDP has 3 states: s₁, s₂, s₃. The state transition probabilities are: p₁₁=0, p₁₂=0.4, p₁₃=0.6. When leaving the state s₁, the agent receives R_s1=2 reward. The state value function of the states s₂ and s₃ are: v₂=8, v₃=4. Calculate the v₁ state value of the state s₁. The discount factor γ=0.5.

Переглянути це питання

The exploration vs. exploitation trade-off refers to:

Whether to act greedily based on current information or try unknown actions for better outcomes

The trade-off between short-term and long-term rewards

Choosing between deterministic and stochastic policies

Choosing between state value function and action value function

Переглянути це питання

Попередня
1
2
3
4
5
6
Наступна

Хочете миттєвий доступ до всіх перевірених відповідей на elearning.aua.am?

Отримайте необмежений доступ до відповідей на екзаменаційні питання - встановіть розширення Crowdly зараз!

Додати до Chrome

Telegram Instagram TikTok Question Bank

Умови використання Зв'яжіться з нами

Додати до Chrome

Додати до Chrome