Daniel Filan

sloeb@superstimul.us

Current deal:
- Research Manage at MATS
- Podcast at AXRP
- Hobby is learning Latin
- Single

Berkeley, California, USA

So here's a dumb question about Jason Gross-style work on compact proofs that I don't want to ask totally publicly - what's the point? I see the value in making the case for interp as being for stuff like compact proofs. But I feel like we know that we aren't going to be able to find literal proofs of relevant safety properties of GPT-4, and we don't even know what those properties should be. So relevant next steps should look like "figure out heuristic arguments" and "figure out WTF AI safety even is" right? So why do more work getting compact proofs of various model properties?

like this

in reply to Daniel Filan

Ben Weinstein-Raun

in reply to Daniel Filan • 9 months ago •

I don't think it's obvious that we can't get proofs of any relevant safety properties. Like, yeah we're not going to get proofs of anything that references human preferences or whatever, but there might be relevant limited subquestions, e.g. about information capacity or something?

Daniel Filan likes this.

in reply to Ben Weinstein-Raun

Daniel Filan

in reply to Ben Weinstein-Raun • 9 months ago •

I guess I just mean that it's really hard to prove anything about big NN behaviour - my understanding is if you try really hard you can do interval propagation in a smart way but that's about it.

This entry was edited (9 months ago)

Ben Weinstein-Raun likes this.

⇧

Daniel Filan

Daniel Filan 9 months ago •

Ben Weinstein-Raun

Daniel Filan

Daniel Filan
9 months ago •