New AXRP! With Evan Hubinger!

Daniel Filan

sloeb@superstimul.us

Current deal:
- Research Manage at MATS
- Podcast at AXRP
- Hobby is learning Latin
- Single

Berkeley, California, USA

The 'model organisms of misalignment' line of research creates AI models that exhibit various types of misalignment, and studies them to try to understand how the misalignment occurs and whether it can be somehow removed. In this episode, Evan Hubinger talks about two papers he's worked on at Anthropic under this agenda: "Sleeper Agents" and "Sycophancy to Subterfuge".

Video
Transcript

Ben Weinstein-Raun likes this.

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 8 months ago •

I like how it looks like the AXRP logo is the sun in this thumbnail.

Ben Weinstein-Raun likes this.

⇧

Daniel Filan

Daniel Filan 8 months ago •

New AXRP! With Evan Hubinger!

Daniel Filan

Daniel Filan
8 months ago •