Crowdly

Додати до Chrome

When using the Q-Learning update rule, how is the next action-value estimate det...

✅ Перевірена відповідь на це питання доступна нижче. Наші рішення, перевірені спільнотою, допомагають краще зрозуміти матеріал.

When using the Q-Learning update rule, how is the next action-value estimate determined?

By following the current policy

❌

By sampling the next action randomly

❌

By averaging the Q-values of all possible actions

❌

By selecting the action with the maximum estimated Q-value

✅

Більше питань подібних до цього

Хочете миттєвий доступ до всіх перевірених відповідей на elearning.aua.am?

Отримайте необмежений доступ до відповідей на екзаменаційні питання - встановіть розширення Crowdly зараз!

Додати до Chrome

Telegram Instagram TikTok Question Bank

Умови використання Зв'яжіться з нами