Ben Weinstein-Raun likes this.
New AXRP! With Evan Hubinger!
This time I won't retract it, I swear!
The 'model organisms of misalignment' line of research creates AI models that exhibit various types of misalignment, and studies them to try to understand how the misalignment occurs and whether it can be somehow removed. In this episode, Evan Hubinger talks about two papers he's worked on at Anthropic under this agenda: "Sleeper Agents" and "Sycophancy to Subterfuge".
Ben Weinstein-Raun likes this.
like this
Ben Weinstein-Raun likes this.
like this
Amber Dawn likes this.
Latin practice day 7
These aren't very inspired but:
I. Cūr quaeque littera Graeca pulchrior est quam quaeque littera Latīna?
II. Sī linguam Latīnam scīre vult, quotiēs quamque litteram Latīnam scrībere necesse est?
III. Vōlōne ā magistrō laudārī?
IV. In Capitulō XVI, quia Dominus Iēsus tempestātem facit apud navem Lydiae? Lydia ā Deō dīligiturne?
V. Num medicus labōrans vērē sanat hominēs aegrōs?
VI. Num parēntēs laudant magister discipulōs verberāntem?
VII. Suntne bēstiolae industriorēs quam apēs? Quid facit illae?
VIII. Quia dea est pulcherrima?
IX. Hōdiē, quae bonae rēs daminī ā deī?
Hōdiē sum in domō parentum matris mea, in Arizonā. In hāc domō, saepe dormō in lectō parvō in cubiculō parvō, sed hōdiē habeō magnum cubiculum ac magnum lectum. Cēnābam cum parentibus matris meus, et cum amīcīs suīs. Aliī hominēs ēdēbant magnam avem, sed ego edēbam botulōs quī ex holeribus fīunt, nam Pythagoricus sum. Cōnspiciēbāmus pēs-pilam (harpastum? calcifollem? I guess Vicipaedia uses "Harpastum") - Leōnēs Detroitī, quī amantur ā parentēs matris meus, vincēbant contra Ursōs Sicāgoensis!
(I only know the imperfect past tense, forgive me)
Every country in the world belongs to America
Shouldn't the US buy the Vatican?
- they're rapidly going bankrupt and could use the money
- Trump would go for it
- the US is the new Rome
- would bring the US tons of geopolitical power
- new place to station US troops without any restrictions
- probably will ensure all Americans go to heaven
- zero downsides
Am I missing something?????
New episode with Jesse Hoogland!
Another short one, I'm afraid.
You may have heard of singular learning theory, and its "local learning coefficient", or LLC - but have you heard of the refined LLC? In this episode, I chat with Jesse Hoogland about his work on SLT, and using the refined LLC to find a new circuit in language models.
Lieke van der Vorst
From: https://x.com/marysia_cc/status/1861148591479288294/photo/1
-
Elena and Anna Balbusso
for Little Knife by Leigh Bardugo
From: https://x.com/marysia_cc/status/1861127999581528531/photo/1
#art
Franz Karl Leopold von Klenze
From: https://x.com/0zmnds/status/1861121676735586756/photo/1
-
Chesley Knight Bonestell, Jr.
From: https://x.com/0zmnds/status/1861297334195495170/photo/1
#art
IMO it's kind of weird that there aren't more blog posts in the rationality-sphere about how to do group house living well. There are a bunch of tricky problems that need solving and opportunities for clever solutions that make people better off, so you'd think there would be much fodder. Possibilities:
- Maybe people just don't think about it very much?
- "Group house living" isn't as culturally salient a category as "parenting", so we're not used to writing about it?
- Most of the problems involve being kind of annoyed at specific people, and so are inherently awkward to talk about?
like this
Yeah reading this I was like 'wow a lot of our lore is about chores'. I guess because this came up as an issue with us, whereas 'there are conflicts/annoyances with the other people' hasn't come up as much, possibly because two of the relationships were selected specifically for not being mutually annoying :p (and luckily you and Ben seem to not annoy each other that much)
Maybe the main tip is 'try to select people you really vibe with/share living preferences with', and if you manage that you will be well-placed to either not have problems (because your preferences don't clash), or to solve them?
Ben Weinstein-Raun likes this.
Short AXRP with Alan Chan!
Another fun short episode!
Road lines, street lights, and licence plates are examples of infrastructure used to ensure that roads operate smoothly. In this episode, Alan Chan talks about using similar interventions to help avoid bad outcomes from the deployment of AI agents.
like this
- all ballots are paper
- there's no such thing as provisional ballots
- postal votes have to arrive by end of voting election night
- on election night, ~all votes are counted by hand, observed by scrutineers employed by the candidates
One issue is that this is maybe somewhat of a simplified account of how places count votes (aec.gov.au/voting/counting/). But I also wonder if it's just one of those things the US could not implement if it tried, due to lack of the relevant kind of "state capacity".
Ben Weinstein-Raun likes this.
New short AXRP with Zhijing Jin!
New episode of AXRP with Zhijing Jin - this time, a short one (22 min), offering an overview of her work. Blurb below, links in comments.
Do language models understand the causal structure of the world, or do they merely note correlations? And what happens when you build a big AI society out of them? In this brief episode, recorded at the Bay Area Alignment Workshop, I chat with Zhijing Jin about her research on these questions.
Ben Weinstein-Raun likes this.
[ul]
[li] foo
[ul] [li] bar [/ul]
[li] baz
[/ul]
Daniel Filan likes this.
Chapters of Familia Romana that are hard, according to me
For background, most chapters are only trying to do one or two 'things'.
Chapter 8:
- all the declensions of quis/quī, hic, is, and ille dropped on you in a single chapter
- in some sense only one 'thing', but that's around 144 forms you've got to remember (4 pronouns x 3 genders x 2 numbers x 6 cases).
- tbh I just went past this and hoped I'd get used to them rather than having to memorize them
Chapter 12:
- the fourth declension
- datives for possession (e.g. "Marcō ūna soror est")
- comparatives
- third declension adjectives
- datives of commanding and obeying
Anyway I'm up to chapter 13 which is less bad, hopefully the density of hard chapters does not increase.
Ben Weinstein-Raun likes this.
Test of tusky
Latin practice day 6
I. Num hominēs quī Berkeleiam incolunt barbarī sunt?
II. Mīlitēs armīs pugnant. Puerī pugnīs pugnant. Quō pugnat mercātor? Quibus pugnant pāstōrēs?
III. Num hominēs quī audiunt AXRP fortiōrēs quam illa quī audiunt The Inside View?
IV. Manūs sunt manūs bracchiī. Digitī sunt manūs manuum. Quae sunt manūs digitōrum?
V. Cūr medicus meus mē crassum esse dīcit?
VI. Potest homō avunculō avunculus esse?
Daniel Filan
in reply to Daniel Filan • •Daniel Filan
in reply to Daniel Filan • •