Looking for ITI0210 Tehisintellekti ja masinõppe alused (2025/26 sügis) test answers and solutions? Browse our comprehensive collection of verified answers for ITI0210 Tehisintellekti ja masinõppe alused (2025/26 sügis) at moodle.taltech.ee.
Get instant access to accurate answers and detailed explanations for your course questions. Our community-driven platform helps students succeed!
In the "Cliff Walking" example above, Q-learning learns the Optimal Path (right along the edge of the cliff), while SARSA learns the Safer Path (farther away). Explain why this difference occurs based on their update equations.
You are designing a pathfinding agent for a grid-based maze where diagonal movement is allowed (cost ) and straight movement cost is 1. You propose using the Manhattan Distance () as a heuristic for the A* algorithm.
Task: 1. Determine if this heuristic is admissible. Prove your answer mathematically or by providing a counter-example. 2. Explain what happens to the optimality of the A* algorithm if we multiply this heuristic by a factor of 2 (i.e.,
Consider using reinforcement learning for controlling a robot with legs (e.g., a humanoid robot or a robot dog) for locomotion (i.e., moving from point A to point B). What could be the states of this learning system? What would the reward be?
In a Convolutional Neural Network (CNN), you have a input image and a filter (kernel). If you apply the filter with stride 1 (one pixel per step) and no padding, what is the dimension of the output feature map? Show the calculation.
Using the Q-learning update rule:
Calculate the new given: Current Learning rate Discount factor Reward received Next state
You are designing a heuristic for a path-finding problem on a grid where diagonal movement is allowed and costs the same as horizontal/vertical movement (cost = 1). Would the Manhattan Distance be an admissible heuristic? Explain why or why not with a counter-example.
What is the "Training Data Crisis" mentioned in the RL slides regarding Chess? Why did supervised learning fail for Chess before reinforcement learning?