Adria on AXRP!
Yet another new episode!
Suppose we're worried about AIs engaging in long-term plans that they don't tell us about. If we were to peek inside their brains, what should we look for to check whether this was happening? In this episode Adrià Garriga-Alonso talks about his work trying to answer this question.
Daniel Filan
in reply to Daniel Filan • •