✅ The verified answer to this question is available below. Our community-reviewed solutions help you understand the material better.
What happens in a dynamic world? Let's change the conditions so that at each step, after the agent's action, there is a 25% chance per room that dirt is created there. If the room was already dirty, nothing happens.
The performance measure we used until now is no longer good. It assumes that the vaccuum cleaner always successfully cleans everywhere and so only measures the energy cost. It is slightly more realistic to include the state of the rooms:
| State | Reward |
|---|---|
| dirty | 0 |
| clean | 1 |
The formula takes the state before the agent's action.
Which agent is better now - the initial primitive agent or the agent with memory? Let's do 10 working steps from the same initial state as in the first question. The exact evaluation is difficult, but you can estimate the performance by assuming that dust is generated in both rooms at every fourth step. From the point of view of the vaccuum cleaner, the rooms will be dirty at step 1, 5, 9...