New AXRP with David Lindner!

Daniel Filan

sloeb@superstimul.us

Current deal:
- Research Manage at MATS
- Podcast at AXRP
- Hobby is learning Latin
- Single

Berkeley, California, USA

In this episode, I talk with David Lindner about Myopic Optimization with Non-myopic Approval, or MONA, which attempts to address (multi-step) reward hacking by myopically optimizing actions against a human's sense of whether those actions are generally good. Does this work? Can we get smarter-than-human AI this way? How does this compare to approaches like conservativism? Listen to find out.

Video
Transcript

⇧

Daniel Filan

Daniel Filan 1 year ago •

New AXRP with David Lindner!

Daniel Filan
1 year ago •