Superstimulus | Daniel Filan @ Superstimulus

Daniel Filan

sloeb@superstimul.us

Current deal:
- Research Manage at MATS
- Podcast at AXRP
- Hobby is learning Latin
- Single

Berkeley, California, USA

In this episode, I chat with Samuel Albanie about the Google DeepMind paper he co-authored called "An Approach to Technical AGI Safety and Security". It covers the assumptions made by the approach, as well as the types of mitigations it outlines.

Video
Transcript

Please wait

View in context

Daniel Filan

1 month ago

Daniel Filan
1 month ago

New AXRP with Peter Salib!

In this episode, I talk with Peter Salib about his paper "AI Rights for Human Safety", arguing that giving AIs the right to contract, hold property, and sue people will reduce the risk of their trying to attack humanity and take over. He also tells me how law reviews work, in the face of my incredulity.

Video
Transcript

Please wait

View in context

Daniel Filan

1 month ago

Daniel Filan
1 month ago

New AXRP with David Lindner!

In this episode, I talk with David Lindner about Myopic Optimization with Non-myopic Approval, or MONA, which attempts to address (multi-step) reward hacking by myopically optimizing actions against a human's sense of whether those actions are generally good. Does this work? Can we get smarter-than-human AI this way? How does this compare to approaches like conservativism? Listen to find out.

Video
Transcript

Please wait

View in context

Daniel Filan

2 months ago

Daniel Filan
2 months ago

Owain on AXRP!!!

Earlier this year, the paper "Emergent Misalignment" made the rounds on AI x-risk social media for seemingly showing LLMs generalizing from 'misaligned' training data of insecure code to acting comically evil in response to innocuous questions. In this episode, I chat with one of the authors of that paper, Owain Evans, about that research as well as other work he's done to understand the psychology of large language models.

Video
Transcript

Please wait

View in context

Daniel Filan

2 months ago

Daniel Filan
2 months ago

New AXRP episode with Lee Sharkey!

What's the next step forward in interpretability? In this episode, I chat with Lee Sharkey about his proposal for detecting computational mechanisms within neural networks: Attribution-based Parameter Decomposition, or APD for short.

Video
Transcript

Please wait

View in context

Daniel Filan

3 months ago

Daniel Filan
3 months ago

C'mon Fabulae Syrae, this is not a very good explanation of what a harundō is.

Please wait

View in context

Daniel Filan

4 months ago

Daniel Filan
4 months ago

AXRP Jason Gross

How do we figure out whether interpretability is doing its job? One way is to see if it helps us prove things about models that we care about knowing. In this episode, I speak with Jason Gross about his agenda to benchmark interpretability in this way, and his exploration of the intersection of proofs and modern machine learning.

Transcript
YouTube

Please wait

View in context

Daniel Filan

5 months ago

Daniel Filan
5 months ago

Johann Friedrich Morgenstern
From https://x.com/0zmnds/status/1890834305208696987
#art

#art

Please wait

View in context

Daniel Filan

6 months ago

Daniel Filan
6 months ago

Yet another short AXRP episode!

With Anthony Aguirre!

The Future of Life Institute is one of the oldest and most prominant organizations in the AI existential safety space, working on such topics as the AI pause open letter and how the EU AI Act can be improved. Metaculus is one of the premier forecasting sites on the internet. Behind both of them lie one man: Anthony Aguirre, who I talk with in this episode.

Video
Transcript

Please wait

View in context

Daniel Filan

6 months ago

Daniel Filan
6 months ago

More AXRP! Joel Lehman!

Typically this podcast talks about how to avert destruction from AI. But what would it take to ensure AI promotes human flourishing as well as it can? Is alignment to individuals enough, and if not, where do we go form here? In this episode, I talk with Joel Lehman about these questions.

Video
Transcript

Please wait

View in context

Daniel Filan

6 months ago

Daniel Filan
6 months ago

Misty morning at Lanhydrock. Cornwall, England. NMP
From: https://x.com/HoganSOG/status/1882211656283111582/photo/1

#art

#art

Please wait

View in context

Daniel Filan

6 months ago

Daniel Filan
6 months ago

Junichiro Sekino 1914-1988
Night in Kyoto
#art

From: https://x.com/marysia_cc/status/1882215670282166390

#art

Please wait

View in context

Daniel Filan

6 months ago

Daniel Filan
6 months ago

Tanaka Ryōhei (1933-2019)
Crow and Persimmon in the Snow

From: https://x.com/marysia_cc/status/1881097630148907230/photo/1

#art

#art

Please wait

View in context

Daniel Filan

6 months ago

Daniel Filan
6 months ago

Adria on AXRP!

Yet another new episode!

Suppose we're worried about AIs engaging in long-term plans that they don't tell us about. If we were to peek inside their brains, what should we look for to check whether this was happening? In this episode Adrià Garriga-Alonso talks about his work trying to answer this question.

Transcript
Video

Please wait

View in context

Daniel Filan

7 months ago

Daniel Filan
7 months ago

Thanks Wiktionary

Please wait

View in context

Daniel Filan

7 months ago

Daniel Filan
7 months ago

Happy New AXRP!

Yet another in the Alignment Workshop series.

AI researchers often complain about the poor coverage of their work in the news media. But why is this happening, and how can it be fixed? In this episode, I speak with Shakeel Hashim about the resource constraints facing AI journalism, the disconnect between journalists' and AI researchers' views on transformative AI, and efforts to improve the state of AI journalism, such as Tarbell and Shakeel's newsletter, Transformer.

Transcript
Video

Please wait

View in context

Daniel Filan

7 months ago

Daniel Filan
7 months ago

Reminds me of this part of the Biden-Trump debate:

Please wait

View in context

Daniel Filan

7 months ago

Daniel Filan
7 months ago

I really like how smooth and clean this retention curve is - this is for my episode with Evan Hubinger, the height of the line is what fraction of viewers are still watching at any given time.

Please wait

View in context

Daniel Filan

7 months ago

Daniel Filan
7 months ago

MOAR AXRP

This time with Erik Jenner, on a paper he's presenting at NeurIPS tomorrow - check it out if you're there!

Lots of people in the AI safety space worry about models being able to make deliberate, multi-step plans. But can we already see this in existing neural nets? In this episode, I talk with Erik Jenner about his work looking at internal look-ahead within chess-playing neural networks.

Video
Transcript

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

Jeroen Henneman, The Long Way Home
From: https://x.com/opancaro/status/186529216161008481

Wilhelm Kranz
From: https://x.com/0zmnds/status/1865291905249980735

#art

#art

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

Gustave Doré
From: https://x.com/0zmnds/status/1863475184344174739
#art

#art

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

Victo Ngai
From: https://x.com/opancaro/status/1863111407962599592
#art

#art

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

New AXRP! With Evan Hubinger!

This time I won't retract it, I swear!

The 'model organisms of misalignment' line of research creates AI models that exhibit various types of misalignment, and studies them to try to understand how the misalignment occurs and whether it can be somehow removed. In this episode, Evan Hubinger talks about two papers he's worked on at Anthropic under this agenda: "Sleeper Agents" and "Sycophancy to Subterfuge".

Video
Transcript

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

Orestes Pursued by the Furies
John Singer Sargent, 1921
(taken from Wikimeda commons)

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

New episode with Jesse Hoogland!

Another short one, I'm afraid.

You may have heard of singular learning theory, and its "local learning coefficient", or LLC - but have you heard of the refined LLC? In this episode, I chat with Jesse Hoogland about his work on SLT, and using the refined LLC to find a new circuit in language models.

YouTube
Transcript

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

Lieke van der Vorst
From: https://x.com/marysia_cc/status/1861148591479288294/photo/1

Elena and Anna Balbusso
for Little Knife by Leigh Bardugo
From: https://x.com/marysia_cc/status/1861127999581528531/photo/1

#art

#art

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

Franz Karl Leopold von Klenze
From: https://x.com/0zmnds/status/1861121676735586756/photo/1

Chesley Knight Bonestell, Jr.
From: https://x.com/0zmnds/status/1861297334195495170/photo/1

#art

#art

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

This seems like a pretty thin market for a pretty important question!

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

Hideo Takeda
From: https://x.com/opancaro/status/1859473265149776148

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

it's even more pronounced for the Alan Chan episode

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

also weird that they seem to have decided to boost my Zhijing video for a day.

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

Not loving that YouTube is congratulating me on becoming an agent of addiction

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

Franz Caucig
From: https://x.com/0zmnds/status/1858558034307674338

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

The Real Realm
Liu Kuo-Sung 1999
From: https://x.com/blanc_alba/status/1858225969443811511

Arte: Vol des grues vers la lune d'or
by Fujiyama Nobu; Rudi.
From: https://x.com/ClaraOlwen/status/1858242777517109366

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

Senju Hiroshi
Night Performance, 1995
From: https://x.com/marysia_cc/status/1857935683240997187/photo/1

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

Short AXRP with Alan Chan!

Another fun short episode!

Road lines, street lights, and licence plates are examples of infrastructure used to ensure that roads operate smoothly. In this episode, Alan Chan talks about using similar interventions to help avoid bad outcomes from the deployment of AI agents.

YouTube link
Transcript

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

Senbon Ichou by Mikiko Noji
From: https://x.com/marysia_cc/status/1857705164251050362

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

From:
- https://x.com/0zmnds/status/1857570527134859688/photo/1
- https://x.com/0zmnds/status/1857570023830847567/photo/1

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

Shoda Koho ( 1871-1946 ) Moonlight Sea c. 1930
From: https://x.com/marysia_cc/status/1857172152174157921/photo/1

Leonard Weisgard
illustration from Look at the Moon (1969)
From: https://x.com/marysia_cc/status/1857488438691291585/photo/1

Please wait

View in context

Daniel Filan

8 months ago

Daniel Filan
8 months ago

Basket of Lemons, 1992 - Jose Escofet.
From: https://x.com/MenschOhneMusil/status/1856974702498996385

Please wait

View in context

⇧

Daniel Filan 1 month ago

Daniel Filan 1 month ago

Daniel Filan 1 month ago

Daniel Filan 2 months ago

Daniel Filan 2 months ago

Daniel Filan 3 months ago

Daniel Filan 4 months ago

Daniel Filan 5 months ago

Daniel Filan 6 months ago

Daniel Filan 6 months ago

Daniel Filan 6 months ago

Daniel Filan 6 months ago

Daniel Filan 6 months ago

Daniel Filan 6 months ago

Daniel Filan 7 months ago

Daniel Filan 7 months ago

Daniel Filan 7 months ago

Daniel Filan 7 months ago

Daniel Filan 7 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan 8 months ago

Daniel Filan
1 month ago

Daniel Filan
1 month ago

Daniel Filan
1 month ago

Daniel Filan
2 months ago

Daniel Filan
2 months ago

Daniel Filan
3 months ago

Daniel Filan
4 months ago

Daniel Filan
5 months ago

Daniel Filan
6 months ago

Daniel Filan
6 months ago

Daniel Filan
6 months ago

Daniel Filan
6 months ago

Daniel Filan
6 months ago

Daniel Filan
6 months ago

Daniel Filan
7 months ago

Daniel Filan
7 months ago

Daniel Filan
7 months ago

Daniel Filan
7 months ago

Daniel Filan
7 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago

Daniel Filan
8 months ago