Skip to main content

Happy New AXRP!


Yet another in the Alignment Workshop series.

AI researchers often complain about the poor coverage of their work in the news media. But why is this happening, and how can it be fixed? In this episode, I speak with Shakeel Hashim about the resource constraints facing AI journalism, the disconnect between journalists' and AI researchers' views on transformative AI, and efforts to improve the state of AI journalism, such as Tarbell and Shakeel's newsletter, Transformer.

Transcript
Video


Reminds me of this part of the Biden-Trump debate:


I really like how smooth and clean this retention curve is - this is for my episode with Evan Hubinger, the height of the line is what fraction of viewers are still watching at any given time.


MOAR AXRP


This time with Erik Jenner, on a paper he's presenting at NeurIPS tomorrow - check it out if you're there!

Lots of people in the AI safety space worry about models being able to make deliberate, multi-step plans. But can we already see this in existing neural nets? In this episode, I talk with Erik Jenner about his work looking at internal look-ahead within chess-playing neural networks.

Video
Transcript


Jeroen Henneman, The Long Way Home
From: https://x.com/opancaro/status/186529216161008481


Wilhelm Kranz
From: https://x.com/0zmnds/status/1865291905249980735

#art

#art


Gustave Doré
From: https://x.com/0zmnds/status/1863475184344174739
#art
#art


Victo Ngai
From: https://x.com/opancaro/status/1863111407962599592
#art
#art


New AXRP! With Evan Hubinger!


This time I won't retract it, I swear!

The 'model organisms of misalignment' line of research creates AI models that exhibit various types of misalignment, and studies them to try to understand how the misalignment occurs and whether it can be somehow removed. In this episode, Evan Hubinger talks about two papers he's worked on at Anthropic under this agenda: "Sleeper Agents" and "Sycophancy to Subterfuge".

Video
Transcript


Orestes Pursued by the Furies
John Singer Sargent, 1921
(taken from Wikimeda commons)


New episode with Jesse Hoogland!


Another short one, I'm afraid.

You may have heard of singular learning theory, and its "local learning coefficient", or LLC - but have you heard of the refined LLC? In this episode, I chat with Jesse Hoogland about his work on SLT, and using the refined LLC to find a new circuit in language models.

YouTube
Transcript


Lieke van der Vorst
From: https://x.com/marysia_cc/status/1861148591479288294/photo/1

-

Elena and Anna Balbusso
for Little Knife by Leigh Bardugo
From: https://x.com/marysia_cc/status/1861127999581528531/photo/1

#art

#art


Franz Karl Leopold von Klenze
From: https://x.com/0zmnds/status/1861121676735586756/photo/1

-

Chesley Knight Bonestell, Jr.
From: https://x.com/0zmnds/status/1861297334195495170/photo/1

#art

#art


This seems like a pretty thin market for a pretty important question!


Hideo Takeda
From: https://x.com/opancaro/status/1859473265149776148


it's even more pronounced for the Alan Chan episode


also weird that they seem to have decided to boost my Zhijing video for a day.


Not loving that YouTube is congratulating me on becoming an agent of addiction


Franz Caucig
From: https://x.com/0zmnds/status/1858558034307674338


The Real Realm
Liu Kuo-Sung 1999
From: https://x.com/blanc_alba/status/1858225969443811511

-

Arte: Vol des grues vers la lune d'or
by Fujiyama Nobu; Rudi.
From: https://x.com/ClaraOlwen/status/1858242777517109366


Senju Hiroshi
Night Performance, 1995
From: https://x.com/marysia_cc/status/1857935683240997187/photo/1


Short AXRP with Alan Chan!


Another fun short episode!

Road lines, street lights, and licence plates are examples of infrastructure used to ensure that roads operate smoothly. In this episode, Alan Chan talks about using similar interventions to help avoid bad outcomes from the deployment of AI agents.

YouTube link
Transcript


Senbon Ichou by Mikiko Noji
From: https://x.com/marysia_cc/status/1857705164251050362


From:
- https://x.com/0zmnds/status/1857570527134859688/photo/1
- https://x.com/0zmnds/status/1857570023830847567/photo/1


Shoda Koho ( 1871-1946 ) Moonlight Sea c. 1930
From: https://x.com/marysia_cc/status/1857172152174157921/photo/1

+

Leonard Weisgard
illustration from Look at the Moon (1969)
From: https://x.com/marysia_cc/status/1857488438691291585/photo/1


Basket of Lemons, 1992 - Jose Escofet.
From: https://x.com/MenschOhneMusil/status/1856974702498996385


New short AXRP with Zhijing Jin!


New episode of AXRP with Zhijing Jin - this time, a short one (22 min), offering an overview of her work. Blurb below, links in comments.

Do language models understand the causal structure of the world, or do they merely note correlations? And what happens when you build a big AI society out of them? In this brief episode, recorded at the Bay Area Alignment Workshop, I chat with Zhijing Jin about her research on these questions.

YouTube

Transcript


Susan Noble
‘Autumn Ferns’
From: https://x.com/marysia_cc/status/1855887688668524839


William H. Hays: Mountain Melody, 2022
linocut
From: https://x.com/marysia_cc/status/1854986382957330908/photo/1


From: https://x.com/0zmnds/status/1854995056660562070/photo/1


From: https://x.com/0zmnds/status/1854948751812837573/photo/1


From: https://x.com/opancaro/status/1854969896871870644/photo/1


Esa Riippa (Finnish, b. 1947)
Nocturno
From: https://x.com/marysia_cc/status/1854960702504812953/photo/1


Paul Binnie Scottish, b. 1967
Moon at Shinobazu
From: https://x.com/marysia_cc/status/1854938647117640135/photo/1


Tanaka Ryōhei
Persimmons . Mountains

From: https://x.com/marysia_cc/status/1853326310933770502/photo/1


Shiro Shirahata- Moon over Fuji, 1972.

From: https://x.com/MenschOhneMusil/status/1853201333983023451/photo/1


Toni Demuro, Solo per Gatti, 2023.

From: https://x.com/MenschOhneMusil/status/1853183759639839203/photo/1


Last Train/Look

Night Train. 2020 Ink on paper

Christoph Niemann.
American, born in 1970.

From: https://x.com/fraveris/status/1853184339753992541/photo/1


A Beautiful Moment, Dee Nickerson.
From: https://x.com/MenschOhneMusil/status/1852592656905335109