So here's a dumb question about Jason Gross-style work on compact proofs that I don't want to ask totally publicly - what's the point? I see the value in making the case for interp as being for stuff like compact proofs. But I feel like we know that we aren't going to be able to find literal proofs of relevant safety properties of GPT-4, and we don't even know what those properties should be. So relevant next steps should look like "figure out heuristic arguments" and "figure out WTF AI safety even is" right? So why do more work getting compact proofs of various model properties?
like this
Ben Weinstein-Raun
in reply to Daniel Filan • •Daniel Filan likes this.
Daniel Filan
in reply to Ben Weinstein-Raun • •Ben Weinstein-Raun likes this.