"Coherence theorems"

Daniel Filan

sloeb@superstimul.us

Current deal:
- Research Manage at MATS
- Podcast at AXRP
- Hobby is learning Latin
- Single

Berkeley, California, USA

Had a conversation today about whether "coherence theorems" exist, what they are, to what extent you're shooting yourself in the foot if you're not an expected utility maximizer, to what extent agents will self-modify to be more coherent, etc. The last question is interesting and non-trivial. I think that in order to get self-modification, you have to have some preferences that are more basic than others. So the thing that happens is supposed to be something like:
- you realize that your current preferences will result in you having almost no influence in the future with high probability
- you realize that you don't want that
- so you change your preferences so that you stop shooting yourself in the foot etc.

The reason this comes up is that the person I was talking to made a distinction between "selection theorems" and "coherence theorems", where "if you do this you'll die out in the future" is just a selection theorem, and to be a "coherence theorem" you have to end up wanting to self-modify or something. But if you assume agents don't want to be selected against,

like this

⇧

Daniel Filan

Daniel Filan 1 year ago •

"Coherence theorems"

Daniel Filan
1 year ago •