Happy New AXRP!
Yet another in the Alignment Workshop series.
AI researchers often complain about the poor coverage of their work in the news media. But why is this happening, and how can it be fixed? In this episode, I speak with Shakeel Hashim about the resource constraints facing AI journalism, the disconnect between journalists' and AI researchers' views on transformative AI, and efforts to improve the state of AI journalism, such as Tarbell and Shakeel's newsletter, Transformer.
MOAR AXRP
This time with Erik Jenner, on a paper he's presenting at NeurIPS tomorrow - check it out if you're there!
Lots of people in the AI safety space worry about models being able to make deliberate, multi-step plans. But can we already see this in existing neural nets? In this episode, I talk with Erik Jenner about his work looking at internal look-ahead within chess-playing neural networks.
Jeroen Henneman, The Long Way Home
From: https://x.com/opancaro/status/186529216161008481
Wilhelm Kranz
From: https://x.com/0zmnds/status/1865291905249980735
#art
New AXRP! With Evan Hubinger!
This time I won't retract it, I swear!
The 'model organisms of misalignment' line of research creates AI models that exhibit various types of misalignment, and studies them to try to understand how the misalignment occurs and whether it can be somehow removed. In this episode, Evan Hubinger talks about two papers he's worked on at Anthropic under this agenda: "Sleeper Agents" and "Sycophancy to Subterfuge".
New episode with Jesse Hoogland!
Another short one, I'm afraid.
You may have heard of singular learning theory, and its "local learning coefficient", or LLC - but have you heard of the refined LLC? In this episode, I chat with Jesse Hoogland about his work on SLT, and using the refined LLC to find a new circuit in language models.
Lieke van der Vorst
From: https://x.com/marysia_cc/status/1861148591479288294/photo/1
-
Elena and Anna Balbusso
for Little Knife by Leigh Bardugo
From: https://x.com/marysia_cc/status/1861127999581528531/photo/1
#art
Franz Karl Leopold von Klenze
From: https://x.com/0zmnds/status/1861121676735586756/photo/1
-
Chesley Knight Bonestell, Jr.
From: https://x.com/0zmnds/status/1861297334195495170/photo/1
#art
The Real Realm
Liu Kuo-Sung 1999
From: https://x.com/blanc_alba/status/1858225969443811511
-
Arte: Vol des grues vers la lune d'or
by Fujiyama Nobu; Rudi.
From: https://x.com/ClaraOlwen/status/1858242777517109366
Night Performance, 1995
From: https://x.com/marysia_cc/status/1857935683240997187/photo/1
Short AXRP with Alan Chan!
Another fun short episode!
Road lines, street lights, and licence plates are examples of infrastructure used to ensure that roads operate smoothly. In this episode, Alan Chan talks about using similar interventions to help avoid bad outcomes from the deployment of AI agents.
From: https://x.com/marysia_cc/status/1857705164251050362
- https://x.com/0zmnds/status/1857570527134859688/photo/1
- https://x.com/0zmnds/status/1857570023830847567/photo/1
Shoda Koho ( 1871-1946 ) Moonlight Sea c. 1930
From: https://x.com/marysia_cc/status/1857172152174157921/photo/1
+
Leonard Weisgard
illustration from Look at the Moon (1969)
From: https://x.com/marysia_cc/status/1857488438691291585/photo/1
From: https://x.com/MenschOhneMusil/status/1856974702498996385
New short AXRP with Zhijing Jin!
New episode of AXRP with Zhijing Jin - this time, a short one (22 min), offering an overview of her work. Blurb below, links in comments.
Do language models understand the causal structure of the world, or do they merely note correlations? And what happens when you build a big AI society out of them? In this brief episode, recorded at the Bay Area Alignment Workshop, I chat with Zhijing Jin about her research on these questions.
linocut
From: https://x.com/marysia_cc/status/1854986382957330908/photo/1
Nocturno
From: https://x.com/marysia_cc/status/1854960702504812953/photo/1
Moon at Shinobazu
From: https://x.com/marysia_cc/status/1854938647117640135/photo/1
Tanaka Ryōhei
Persimmons . Mountains
From: https://x.com/marysia_cc/status/1853326310933770502/photo/1
Shiro Shirahata- Moon over Fuji, 1972.
From: https://x.com/MenschOhneMusil/status/1853201333983023451/photo/1
Toni Demuro, Solo per Gatti, 2023.
From: https://x.com/MenschOhneMusil/status/1853183759639839203/photo/1
Last Train/Look
Night Train. 2020 Ink on paper
Christoph Niemann.
American, born in 1970.
From: https://x.com/fraveris/status/1853184339753992541/photo/1
From: https://x.com/MenschOhneMusil/status/1852592656905335109