Yet another short AXRP episode!
With Anthony Aguirre!
The Future of Life Institute is one of the oldest and most prominant organizations in the AI existential safety space, working on such topics as the AI pause open letter and how the EU AI Act can be improved. Metaculus is one of the premier forecasting sites on the internet. Behind both of them lie one man: Anthony Aguirre, who I talk with in this episode.
More AXRP! Joel Lehman!
Typically this podcast talks about how to avert destruction from AI. But what would it take to ensure AI promotes human flourishing as well as it can? Is alignment to individuals enough, and if not, where do we go form here? In this episode, I talk with Joel Lehman about these questions.
Misty morning at Lanhydrock. Cornwall, England. NMP
From: https://x.com/HoganSOG/status/1882211656283111582/photo/1
#art
Junichiro Sekino 1914-1988
Night in Kyoto
#art
From: https://x.com/marysia_cc/status/1882215670282166390
Tanaka Ryōhei (1933-2019)
Crow and Persimmon in the Snow
From: https://x.com/marysia_cc/status/1881097630148907230/photo/1
#art
Adria on AXRP!
Yet another new episode!
Suppose we're worried about AIs engaging in long-term plans that they don't tell us about. If we were to peek inside their brains, what should we look for to check whether this was happening? In this episode Adrià Garriga-Alonso talks about his work trying to answer this question.
Happy New AXRP!
Yet another in the Alignment Workshop series.
AI researchers often complain about the poor coverage of their work in the news media. But why is this happening, and how can it be fixed? In this episode, I speak with Shakeel Hashim about the resource constraints facing AI journalism, the disconnect between journalists' and AI researchers' views on transformative AI, and efforts to improve the state of AI journalism, such as Tarbell and Shakeel's newsletter, Transformer.
MOAR AXRP
This time with Erik Jenner, on a paper he's presenting at NeurIPS tomorrow - check it out if you're there!
Lots of people in the AI safety space worry about models being able to make deliberate, multi-step plans. But can we already see this in existing neural nets? In this episode, I talk with Erik Jenner about his work looking at internal look-ahead within chess-playing neural networks.
Jeroen Henneman, The Long Way Home
From: https://x.com/opancaro/status/186529216161008481
Wilhelm Kranz
From: https://x.com/0zmnds/status/1865291905249980735
#art
New AXRP! With Evan Hubinger!
This time I won't retract it, I swear!
The 'model organisms of misalignment' line of research creates AI models that exhibit various types of misalignment, and studies them to try to understand how the misalignment occurs and whether it can be somehow removed. In this episode, Evan Hubinger talks about two papers he's worked on at Anthropic under this agenda: "Sleeper Agents" and "Sycophancy to Subterfuge".
New episode with Jesse Hoogland!
Another short one, I'm afraid.
You may have heard of singular learning theory, and its "local learning coefficient", or LLC - but have you heard of the refined LLC? In this episode, I chat with Jesse Hoogland about his work on SLT, and using the refined LLC to find a new circuit in language models.
Lieke van der Vorst
From: https://x.com/marysia_cc/status/1861148591479288294/photo/1
-
Elena and Anna Balbusso
for Little Knife by Leigh Bardugo
From: https://x.com/marysia_cc/status/1861127999581528531/photo/1
#art
Franz Karl Leopold von Klenze
From: https://x.com/0zmnds/status/1861121676735586756/photo/1
-
Chesley Knight Bonestell, Jr.
From: https://x.com/0zmnds/status/1861297334195495170/photo/1
#art
Short AXRP with Alan Chan!
Another fun short episode!
Road lines, street lights, and licence plates are examples of infrastructure used to ensure that roads operate smoothly. In this episode, Alan Chan talks about using similar interventions to help avoid bad outcomes from the deployment of AI agents.
New short AXRP with Zhijing Jin!
New episode of AXRP with Zhijing Jin - this time, a short one (22 min), offering an overview of her work. Blurb below, links in comments.
Do language models understand the causal structure of the world, or do they merely note correlations? And what happens when you build a big AI society out of them? In this brief episode, recorded at the Bay Area Alignment Workshop, I chat with Zhijing Jin about her research on these questions.