Zeitpunkt Nutzer Delta Tröts TNR Titel Version maxTL Di 06.08.2024 00:00:09 5.534 +1 518.344 93,7 FediScience.org 4.2.10 500 Mo 05.08.2024 00:00:21 5.533 0 517.842 93,6 FediScience.org 4.2.10 500 So 04.08.2024 00:03:40 5.533 0 517.470 93,5 FediScience.org 4.2.10 500 Sa 03.08.2024 00:03:48 5.533 +1 516.926 93,4 FediScience.org 4.2.10 500 Fr 02.08.2024 00:04:14 5.532 +1 516.348 93,3 FediScience.org 4.2.10 500 Do 01.08.2024 00:00:58 5.531 0 515.720 93,2 FediScience.org 4.2.10 500 Mi 31.07.2024 00:03:11 5.531 0 515.162 93,1 FediScience.org 4.2.10 500 Di 30.07.2024 00:02:57 5.531 0 514.522 93,0 FediScience.org 4.2.10 500 Mo 29.07.2024 00:00:47 5.531 +1 514.020 92,9 FediScience.org 4.2.10 500 So 28.07.2024 00:00:02 5.530 0 513.540 92,9 FediScience.org 4.2.10 500
Emerson Harkin (@efharkin) · 11/2022 · Tröts: 20 · Folger: 52
Di 06.08.2024 12:10
New blog post about asymmetric learning rates in reward learning. What do they do and why?
Slower learning for negative RPEs causes slower learning when the environment gets worse (duh!) but also causes an optimism bias when rewards are random. 🤔
Check out the full post and interactive widgets to develop an intuition for this optimism bias.
https://efharkin.com/blog/2024-07-asymmetric-learning-widget/
Media: Gifv
[Öffentlich] Antw.: 0 Wtrl.: 0 Fav.: 0 · via Web