Crowdly

Add to Chrome

Universities
elearning.aua.am
Reinforcement Learning - Fall 2025

Reinforcement Learning - Fall 2025

Looking for Reinforcement Learning - Fall 2025 test answers and solutions? Browse our comprehensive collection of verified answers for Reinforcement Learning - Fall 2025 at elearning.aua.am.

Get instant access to accurate answers and detailed explanations for your course questions. Our community-driven platform helps students succeed!

In Reinforcement Learning, what does the term “agent” refer to?

A labeled data point

A software program making decisions

A neural network architecture

A person supervising the learning process

View this question

What is the main goal of reinforcement learning?

To minimize the quadratic loss function

To maximize the cumulative reward

To maximize the number of steps

To minimize the actions

View this question

What does the value function represent in RL?

The state transition dynamics

Future actions to take

The expected cumulative reward from a state

The probability of selecting an action

View this question

What is an action in reinforcement learning?

A state transition

A policy followed by the system

A strategy used by the environment

A decision made by the agent

View this question

What is a policy in reinforcement learning?

A mapping from states to actions

A set of Markov chains

A list of possible rewards

A function to calculate future rewards

View this question

Consider an episodic MDP with one state and two actions (left and right). The left action has stochastic reward 1 with probability p and 3 with probability 1−p. The right action has stochastic reward 0 with probability q and 10 with probability 1−q. What relationship between p and q makes the actions equally optimal?

7+3p=10q

7+3p=−10q

13+3p=10q

7+2p=10q

13+3p=−10q

13+2p=10q

7+2p=−10q

13+2p=−10q

View this question

In a Markov reward process (MRP), the value function v(s) is:

The expected total discounted reward starting from state s

The immediate reward from the state s

The expected action taken from the state s

The optimal policy for state s

View this question

Which property distinguishes an MDP from a regular Markov Chain?

Policy dependency

Transition probabilities

Rewards and actions

Transition function

View this question

Every finite Markov decision process has __. [Select all that apply]

A deterministic optimal policy

A unique optimal policy

A unique optimal value function

A stochastic optimal policy

View this question

Suppose the discount factor γ=0.8 and the reward sequence is R₁=5 followed by an infinite sequence of 10s.

What is G₀?

View this question

Previous
1
2
3
4
5
6

Want instant access to all verified answers on elearning.aua.am?

Get Unlimited Answers To Exam Questions - Install Crowdly Extension Now!

Add to Chrome

Telegram Instagram TikTok Question Bank

Terms of Use Contact Us

Add to Chrome

Add to Chrome