Crowdly

Додати до Chrome

Університети
elearning.aua.am
Reinforcement Learning - Fall 2025

Reinforcement Learning - Fall 2025

Шукаєте відповіді та рішення тестів для Reinforcement Learning - Fall 2025? Перегляньте нашу велику колекцію перевірених відповідей для Reinforcement Learning - Fall 2025 в elearning.aua.am.

Отримайте миттєвий доступ до точних відповідей та детальних пояснень для питань вашого курсу. Наша платформа, створена спільнотою, допомагає студентам досягати успіху!

In Reinforcement Learning, what does the term “agent” refer to?

A labeled data point

A software program making decisions

A neural network architecture

A person supervising the learning process

Переглянути це питання

What is the main goal of reinforcement learning?

To minimize the quadratic loss function

To maximize the cumulative reward

To maximize the number of steps

To minimize the actions

Переглянути це питання

What does the value function represent in RL?

The state transition dynamics

Future actions to take

The expected cumulative reward from a state

The probability of selecting an action

Переглянути це питання

What is an action in reinforcement learning?

A state transition

A policy followed by the system

A strategy used by the environment

A decision made by the agent

Переглянути це питання

What is a policy in reinforcement learning?

A mapping from states to actions

A set of Markov chains

A list of possible rewards

A function to calculate future rewards

Переглянути це питання

Consider an episodic MDP with one state and two actions (left and right). The left action has stochastic reward 1 with probability p and 3 with probability 1−p. The right action has stochastic reward 0 with probability q and 10 with probability 1−q. What relationship between p and q makes the actions equally optimal?

7+3p=10q

7+3p=−10q

13+3p=10q

7+2p=10q

13+3p=−10q

13+2p=10q

7+2p=−10q

13+2p=−10q

Переглянути це питання

In a Markov reward process (MRP), the value function v(s) is:

The expected total discounted reward starting from state s

The immediate reward from the state s

The expected action taken from the state s

The optimal policy for state s

Переглянути це питання

Which property distinguishes an MDP from a regular Markov Chain?

Policy dependency

Transition probabilities

Rewards and actions

Transition function

Переглянути це питання

Every finite Markov decision process has __. [Select all that apply]

A deterministic optimal policy

A unique optimal policy

A unique optimal value function

A stochastic optimal policy

Переглянути це питання

Suppose the discount factor γ=0.8 and the reward sequence is R₁=5 followed by an infinite sequence of 10s.

What is G₀?

Переглянути це питання

Попередня
1
2
3
4
5
6

Хочете миттєвий доступ до всіх перевірених відповідей на elearning.aua.am?

Отримайте необмежений доступ до відповідей на екзаменаційні питання - встановіть розширення Crowdly зараз!

Додати до Chrome

Telegram Instagram TikTok Question Bank

Умови використання Зв'яжіться з нами

Додати до Chrome

Додати до Chrome