Skip to main content


So here's a dumb question about Jason Gross-style work on compact proofs that I don't want to ask totally publicly - what's the point? I see the value in making the case for interp as being for stuff like compact proofs. But I feel like we know that we aren't going to be able to find literal proofs of relevant safety properties of GPT-4, and we don't even know what those properties should be. So relevant next steps should look like "figure out heuristic arguments" and "figure out WTF AI safety even is" right? So why do more work getting compact proofs of various model properties?
in reply to Daniel Filan

I don't think it's obvious that we can't get proofs of any relevant safety properties. Like, yeah we're not going to get proofs of anything that references human preferences or whatever, but there might be relevant limited subquestions, e.g. about information capacity or something?
in reply to Ben Weinstein-Raun

I guess I just mean that it's really hard to prove anything about big NN behaviour - my understanding is if you try really hard you can do interval propagation in a smart way but that's about it.
This entry was edited (3 weeks ago)