Skip to main content

Gustave Doré
From: https://x.com/0zmnds/status/1863475184344174739
#art
#art


Victo Ngai
From: https://x.com/opancaro/status/1863111407962599592
#art
#art


New AXRP! With Evan Hubinger!


This time I won't retract it, I swear!

The 'model organisms of misalignment' line of research creates AI models that exhibit various types of misalignment, and studies them to try to understand how the misalignment occurs and whether it can be somehow removed. In this episode, Evan Hubinger talks about two papers he's worked on at Anthropic under this agenda: "Sleeper Agents" and "Sycophancy to Subterfuge".

Video
Transcript


Orestes Pursued by the Furies
John Singer Sargent, 1921
(taken from Wikimeda commons)


New episode with Jesse Hoogland!


Another short one, I'm afraid.

You may have heard of singular learning theory, and its "local learning coefficient", or LLC - but have you heard of the refined LLC? In this episode, I chat with Jesse Hoogland about his work on SLT, and using the refined LLC to find a new circuit in language models.

YouTube
Transcript


Lieke van der Vorst
From: https://x.com/marysia_cc/status/1861148591479288294/photo/1

-

Elena and Anna Balbusso
for Little Knife by Leigh Bardugo
From: https://x.com/marysia_cc/status/1861127999581528531/photo/1

#art

#art


Franz Karl Leopold von Klenze
From: https://x.com/0zmnds/status/1861121676735586756/photo/1

-

Chesley Knight Bonestell, Jr.
From: https://x.com/0zmnds/status/1861297334195495170/photo/1

#art

#art


This seems like a pretty thin market for a pretty important question!


Hideo Takeda
From: https://x.com/opancaro/status/1859473265149776148


it's even more pronounced for the Alan Chan episode


also weird that they seem to have decided to boost my Zhijing video for a day.


Not loving that YouTube is congratulating me on becoming an agent of addiction


Franz Caucig
From: https://x.com/0zmnds/status/1858558034307674338


The Real Realm
Liu Kuo-Sung 1999
From: https://x.com/blanc_alba/status/1858225969443811511

-

Arte: Vol des grues vers la lune d'or
by Fujiyama Nobu; Rudi.
From: https://x.com/ClaraOlwen/status/1858242777517109366


Senju Hiroshi
Night Performance, 1995
From: https://x.com/marysia_cc/status/1857935683240997187/photo/1


Short AXRP with Alan Chan!


Another fun short episode!

Road lines, street lights, and licence plates are examples of infrastructure used to ensure that roads operate smoothly. In this episode, Alan Chan talks about using similar interventions to help avoid bad outcomes from the deployment of AI agents.

YouTube link
Transcript


Senbon Ichou by Mikiko Noji
From: https://x.com/marysia_cc/status/1857705164251050362


From:
- https://x.com/0zmnds/status/1857570527134859688/photo/1
- https://x.com/0zmnds/status/1857570023830847567/photo/1


Shoda Koho ( 1871-1946 ) Moonlight Sea c. 1930
From: https://x.com/marysia_cc/status/1857172152174157921/photo/1

+

Leonard Weisgard
illustration from Look at the Moon (1969)
From: https://x.com/marysia_cc/status/1857488438691291585/photo/1


Basket of Lemons, 1992 - Jose Escofet.
From: https://x.com/MenschOhneMusil/status/1856974702498996385


New short AXRP with Zhijing Jin!


New episode of AXRP with Zhijing Jin - this time, a short one (22 min), offering an overview of her work. Blurb below, links in comments.

Do language models understand the causal structure of the world, or do they merely note correlations? And what happens when you build a big AI society out of them? In this brief episode, recorded at the Bay Area Alignment Workshop, I chat with Zhijing Jin about her research on these questions.

YouTube

Transcript


Susan Noble
‘Autumn Ferns’
From: https://x.com/marysia_cc/status/1855887688668524839


William H. Hays: Mountain Melody, 2022
linocut
From: https://x.com/marysia_cc/status/1854986382957330908/photo/1


From: https://x.com/0zmnds/status/1854995056660562070/photo/1


From: https://x.com/0zmnds/status/1854948751812837573/photo/1


From: https://x.com/opancaro/status/1854969896871870644/photo/1


Esa Riippa (Finnish, b. 1947)
Nocturno
From: https://x.com/marysia_cc/status/1854960702504812953/photo/1


Paul Binnie Scottish, b. 1967
Moon at Shinobazu
From: https://x.com/marysia_cc/status/1854938647117640135/photo/1


Tanaka Ryōhei
Persimmons . Mountains

From: https://x.com/marysia_cc/status/1853326310933770502/photo/1


Shiro Shirahata- Moon over Fuji, 1972.

From: https://x.com/MenschOhneMusil/status/1853201333983023451/photo/1


Toni Demuro, Solo per Gatti, 2023.

From: https://x.com/MenschOhneMusil/status/1853183759639839203/photo/1


Last Train/Look

Night Train. 2020 Ink on paper

Christoph Niemann.
American, born in 1970.

From: https://x.com/fraveris/status/1853184339753992541/photo/1


A Beautiful Moment, Dee Nickerson.
From: https://x.com/MenschOhneMusil/status/1852592656905335109


Notes on Claude 3.5 Sonnet (new)'s ability to find errors in Latin text


I took an excerpt from a short story written for beginners, and asked Claude to evaluate it, noting that such short stories often contain errors. In a separate chat, I asked the same question, but replaced "Rōmae" with "Rōmā", which I believe is an error (and Claude in the first chat also thinks is an error). In that chat, Claude also thought the text was correct (but had some unrelated complaints). In a third example, I changed the case of a direct object to the ablative/dative instead of the accusative, and it noticed that. So it looks like Claude is not currently consistent at finding errors in Latin grammar.


From: https://x.com/365posterblog1/status/1851223700822987054


Kalshi and PredictIt differ by 10 points! Wild!

Also apparently I can't sell all my "no" shares in Kamala on PI? Quite annoying.


Okitsu-chō, Suruga, by Kawas Hasui, 1934
From: https://x.com/JapanTraCul/status/1851022883394408642


From: https://x.com/madrugada_m/status/1850579024872923510/photo/1


Good night, friends
🎨Xuan Loc Xuan
From: https://x.com/marysia_cc/status/1850266583052284260