Yet another short AXRP episode!
With Anthony Aguirre!
The Future of Life Institute is one of the oldest and most prominant organizations in the AI existential safety space, working on such topics as the AI pause open letter and how the EU AI Act can be improved. Metaculus is one of the premier forecasting sites on the internet. Behind both of them lie one man: Anthony Aguirre, who I talk with in this episode.
Ben Weinstein-Raun likes this.
like this
More AXRP! Joel Lehman!
Typically this podcast talks about how to avert destruction from AI. But what would it take to ensure AI promotes human flourishing as well as it can? Is alignment to individuals enough, and if not, where do we go form here? In this episode, I talk with Joel Lehman about these questions.
Misty morning at Lanhydrock. Cornwall, England. NMP
From: https://x.com/HoganSOG/status/1882211656283111582/photo/1
#art
Miss Gayle likes this.
Junichiro Sekino 1914-1988
Night in Kyoto
#art
From: https://x.com/marysia_cc/status/1882215670282166390
like this
like this
Tanaka Ryōhei (1933-2019)
Crow and Persimmon in the Snow
From: https://x.com/marysia_cc/status/1881097630148907230/photo/1
#art
Adria on AXRP!
Yet another new episode!
Suppose we're worried about AIs engaging in long-term plans that they don't tell us about. If we were to peek inside their brains, what should we look for to check whether this was happening? In this episode Adrià Garriga-Alonso talks about his work trying to answer this question.
Sam FM likes this.
Ben Weinstein-Raun likes this.
Happy New AXRP!
Yet another in the Alignment Workshop series.
AI researchers often complain about the poor coverage of their work in the news media. But why is this happening, and how can it be fixed? In this episode, I speak with Shakeel Hashim about the resource constraints facing AI journalism, the disconnect between journalists' and AI researchers' views on transformative AI, and efforts to improve the state of AI journalism, such as Tarbell and Shakeel's newsletter, Transformer.
Ben Weinstein-Raun likes this.
Ben Weinstein-Raun likes this.
Ben Weinstein-Raun likes this.
like this
like this
Daniel Filan likes this.
Ben Weinstein-Raun likes this.
A question bopping around my mind: are there things like making AXRP or being a MATS RM that I could do instead of those things that would be more valuable? Possible answers:
- just do research that matters
- project manager at a place that does research that matters
- be more directly a competitor to Zvi
- team up with Lawrence Chan and write stuff about various alignment schemes
I think a bottleneck I feel is being unsure about what things are valuable in the info environment, where I think I'm best placed to do stuff.
like this
Ben Weinstein-Raun likes this.
Ben Millwood likes this.
Solstice notes
- I like that the celebration took place on (or adjacent to) the actual solstice
- I broadly thought this year's was worse than last year's, altho it had its charms
- I liked "Humankind as a sailor" - tricky to pick up but rewarding once you did
- Just because a song takes place in Australia, I don't think it thereby glorifies the negative aspects of colonialism.
- The darkness speech was touching this year
- I feel like a lot of the time the speaker would say something I straightforwardly agreed with in the way I would say it and everyone would laugh.
- It was funny when Ozy said her favourite website was Our World in Data and Scott sang the praises of Dustin Moskowitz while I was sitting next to Oli
- I think "the world is awful" is wrong, and not established by there being awful things in the world.
like this
dynomight.net/arguments-3/
Things to argue about over the holidays instead of politics III
report back on how it goesdynomight (DYNOMIGHT)
Daniel Filan likes this.
Daniel Filan reshared this.
How much nesting can we do in English verb tenses, and what controls that? For an example of what I mean, I can say:
- I eat
- I will eat
- I will have been eating
- I will have been going to eat
But I don't think we can say "I will have been going to have eaten".
Ben Weinstein-Raun likes this.
One possibility: basically it goes as far as it makes sense to add extra timing information. But this only works if you disagree about your last positive example, which I personally don't actually think I've ever heard used.
Like, imagine a timeline. "I eat" describes a period of time encompassing now. "I will eat" describes a period of time in the future. "I will have eaten" describes two times; one in the future and one in the past of that future. "I will have been going to eat" describes a time in the future, a time in the past of that future, and a time in the future of that past of the first future. But in some sense this collapses back to the semantic content of "I will eat", and so my guess is that it's basically never used.
Daniel Filan likes this.
Ben Weinstein-Raun likes this.
I think what I mean is that additional times around the loop aren't really adding any extra information, because they introduce new reference points along the timeline that typically don't connect to anything else.
Like, there's some implicit time T that I'm trying to locate with a given statement, and there's an additional time Now that I get from just being in the present.
It makes sense to be like "Some time between Now and [implicitly / contextually defined] T, X will happen", and this is ~ the two-level wrapping. But if you say "Some time between Now and [newly introduced / 'bound' / 'scoped-to-this-statement'] T1, it will be the case that X happened after [implicit / 'free' / contextual] T2", T1 is kind of irrelevant, since it's introduced and used only within the statement.
In principle I guess you could have extra context that disambiguates, but I think it's also kinda relevant that verbs tend to have a subject, a direct object, and up to one indirect object, and typically not more than that.
Daniel Filan likes this.
Daniel Filan likes this.
Ben Weinstein-Raun likes this.
Ben Weinstein-Raun likes this.
MOAR AXRP
This time with Erik Jenner, on a paper he's presenting at NeurIPS tomorrow - check it out if you're there!
Lots of people in the AI safety space worry about models being able to make deliberate, multi-step plans. But can we already see this in existing neural nets? In this episode, I talk with Erik Jenner about his work looking at internal look-ahead within chess-playing neural networks.
Ben Weinstein-Raun likes this.
Jeroen Henneman, The Long Way Home
From: https://x.com/opancaro/status/186529216161008481
Wilhelm Kranz
From: https://x.com/0zmnds/status/1865291905249980735
#art
Ben Weinstein-Raun likes this.
Ben Weinstein-Raun likes this.
New AXRP! With Evan Hubinger!
This time I won't retract it, I swear!
The 'model organisms of misalignment' line of research creates AI models that exhibit various types of misalignment, and studies them to try to understand how the misalignment occurs and whether it can be somehow removed. In this episode, Evan Hubinger talks about two papers he's worked on at Anthropic under this agenda: "Sleeper Agents" and "Sycophancy to Subterfuge".
Ben Weinstein-Raun likes this.
Daniel Filan
in reply to Daniel Filan • •