Owain on AXRP!!!

Daniel Filan

sloeb@superstimul.us

Current deal:
- Research Manage at MATS
- Podcast at AXRP
- Hobby is learning Latin
- Single

Berkeley, California, USA

Earlier this year, the paper "Emergent Misalignment" made the rounds on AI x-risk social media for seemingly showing LLMs generalizing from 'misaligned' training data of insecure code to acting comically evil in response to innocuous questions. In this episode, I chat with one of the authors of that paper, Owain Evans, about that research as well as other work he's done to understand the psychology of large language models.

Video
Transcript

Ben Weinstein-Raun likes this.

⇧

Daniel Filan

Daniel Filan 1 year ago •

Owain on AXRP!!!

Daniel Filan
1 year ago •