logo

Crowdly

In Q-Learning, we often use an -greedy strategy. Task: ...

✅ The verified answer to this question is available below. Our community-reviewed solutions help you understand the material better.

In Q-Learning, we often use an -greedy strategy.

Task:

1. Explain what the parameter

(epsilon) controls.

2. Imagine a scenario: A robot is learning to navigate a cliff edge. It receives -100 for falling off and -1 for every step. If

is kept high (e.g., 0.5) throughout the entire training and testing phase, how will the robot's behavior likely differ from the optimal path?

More questions like this

Want instant access to all verified answers on moodle.taltech.ee?

Get Unlimited Answers To Exam Questions - Install Crowdly Extension Now!