Skip to main content


OK this is probably a dumb question but why did we all decide that bio risk was the most scary thing AIs could do? Did someone write up a justification of that somewhere?
in reply to Daniel Filan

I think it's maybe the scariest thing if you only believe in misuse risk, and lots of people seem to only believe in misuse risk for some reason.
in reply to Daniel Filan

I think what we decided was more like: it might be the *first* way we get an actual catastrophe

in reply to Daniel Filan

sounds like it was intended as a joke; like, linking to "pleonastically" is an illustration of pleonasticism.


Happy New AXRP!


Yet another in the Alignment Workshop series.

AI researchers often complain about the poor coverage of their work in the news media. But why is this happening, and how can it be fixed? In this episode, I speak with Shakeel Hashim about the resource constraints facing AI journalism, the disconnect between journalists' and AI researchers' views on transformative AI, and efforts to improve the state of AI journalism, such as Tarbell and Shakeel's newsletter, Transformer.

Transcript
Video



The worst thing about studying Latin is that it's invaded my mind to the degree that I now almost feel like having five cases is reasonable.
in reply to Daniel Filan

"why not more? Why not make a dedicated instrumental case, or a locative that isn't a sewn-together monstrosity comprised by other cases?" - the sounds of a mind deranged by synthetic languages
This entry was edited (3 weeks ago)
in reply to Daniel Filan

Russian has six! Also the rules for declension in Russian depend on, among other things, whether something is animate. Are dolls animate? (Yes.) Are corpses? (Depends which word you're using.) Are bacteria? (Yes if you're a biologist, probably not otherwise.)


I suspect that "epistemic and instrumental rationality" is better branded and lived as "nobility in thought and deed". But maybe I just have an unusual set of associations with the word "noble"? It's certainly more goal-laden than the word "rational" typically is.
in reply to Daniel Filan

The thing I mean is less altruistic than what David Chapman describes on this page but shares the feature of being valuable and possible.


So here's a dumb question about Jason Gross-style work on compact proofs that I don't want to ask totally publicly - what's the point? I see the value in making the case for interp as being for stuff like compact proofs. But I feel like we know that we aren't going to be able to find literal proofs of relevant safety properties of GPT-4, and we don't even know what those properties should be. So relevant next steps should look like "figure out heuristic arguments" and "figure out WTF AI safety even is" right? So why do more work getting compact proofs of various model properties?
in reply to Daniel Filan

I don't think it's obvious that we can't get proofs of any relevant safety properties. Like, yeah we're not going to get proofs of anything that references human preferences or whatever, but there might be relevant limited subquestions, e.g. about information capacity or something?
in reply to Ben Weinstein-Raun

I guess I just mean that it's really hard to prove anything about big NN behaviour - my understanding is if you try really hard you can do interval propagation in a smart way but that's about it.
This entry was edited (3 weeks ago)


A question bopping around my mind: are there things like making AXRP or being a MATS RM that I could do instead of those things that would be more valuable? Possible answers:
- just do research that matters
- project manager at a place that does research that matters
- be more directly a competitor to Zvi
- team up with Lawrence Chan and write stuff about various alignment schemes

I think a bottleneck I feel is being unsure about what things are valuable in the info environment, where I think I'm best placed to do stuff.




So like.... what's so good about trains? Why would someone think they are so much cooler than cars / trucks / aeroplanes?
This entry was edited (3 weeks ago)
in reply to Daniel Filan

  • Bigger / heavier
  • Stronger / move more stuff
  • Make way better sounds
in reply to Daniel Filan

The infrastructure is somehow really appealing (rails, railroad switches, signals). And there's something great about the way they glide along the track.


Thing I just learned: the author of Paul: a Very Short Introduction, one of my favourite entries in the Very Short Introduction series and one I frequently recommend, is written by E. P. Sanders - one of the most prominent 20th century scholars on the apostle Paul and his thought. Self-recommending!
This entry was edited (3 weeks ago)
in reply to Daniel Filan

It really says something about where I'm at today, that it took multiple seconds before I realized you weren't talking about Paul Christiano.
in reply to Ben Weinstein-Raun

RIP I included 'apostle' or something in an earlier draft of this explicitly to counteract this, but randomly left it out of the final version. Fixing it.


perfect past tense of "incipere" is "coepisse". wtf.
This entry was edited (3 weeks ago)
in reply to Daniel Filan

In general on one hand I'm like "I'm so grateful English has so much grammar and vocabulary to make it so expressive" but when I see Latin I'm like "Japanese copes with just having past past and non-past plus some participles, why can't you" (not even getting to the whole thing of having different genders and different declensions for nouns and adjectives).
in reply to Daniel Filan

Ironically "coep-" is now the perfect stem I am perhaps least likely to forget.


Solstice notes


  • I like that the celebration took place on (or adjacent to) the actual solstice
  • I broadly thought this year's was worse than last year's, altho it had its charms
  • I liked "Humankind as a sailor" - tricky to pick up but rewarding once you did
  • Just because a song takes place in Australia, I don't think it thereby glorifies the negative aspects of colonialism.
  • The darkness speech was touching this year
  • I feel like a lot of the time the speaker would say something I straightforwardly agreed with in the way I would say it and everyone would laugh.
  • It was funny when Ozy said her favourite website was Our World in Data and Scott sang the praises of Dustin Moskowitz while I was sitting next to Oli
  • I think "the world is awful" is wrong, and not established by there being awful things in the world.
in reply to Daniel Filan

Also 'Humankind as a Sailor' is now on my non-core solstice music playlist and so popped up while I was on the rowing machine - total disaster, induced complete muscle confusion.



I really like how smooth and clean this retention curve is - this is for my episode with Evan Hubinger, the height of the line is what fraction of viewers are still watching at any given time.


How much nesting can we do in English verb tenses, and what controls that? For an example of what I mean, I can say:
- I eat
- I will eat
- I will have been eating
- I will have been going to eat

But I don't think we can say "I will have been going to have eaten".

in reply to Daniel Filan

One possibility: basically it goes as far as it makes sense to add extra timing information. But this only works if you disagree about your last positive example, which I personally don't actually think I've ever heard used.

Like, imagine a timeline. "I eat" describes a period of time encompassing now. "I will eat" describes a period of time in the future. "I will have eaten" describes two times; one in the future and one in the past of that future. "I will have been going to eat" describes a time in the future, a time in the past of that future, and a time in the future of that past of the first future. But in some sense this collapses back to the semantic content of "I will eat", and so my guess is that it's basically never used.

in reply to Ben Weinstein-Raun

Or, maybe I think your last positive example is sometimes acceptable, but only if the "going to" is actually describing an intention rather than tense information.
in reply to Ben Weinstein-Raun

I guess I don't get why it makes sense to talk about two times but not three.
in reply to Daniel Filan

I think what I mean is that additional times around the loop aren't really adding any extra information, because they introduce new reference points along the timeline that typically don't connect to anything else.

Like, there's some implicit time T that I'm trying to locate with a given statement, and there's an additional time Now that I get from just being in the present.

It makes sense to be like "Some time between Now and [implicitly / contextually defined] T, X will happen", and this is ~ the two-level wrapping. But if you say "Some time between Now and [newly introduced / 'bound' / 'scoped-to-this-statement'] T1, it will be the case that X happened after [implicit / 'free' / contextual] T2", T1 is kind of irrelevant, since it's introduced and used only within the statement.

In principle I guess you could have extra context that disambiguates, but I think it's also kinda relevant that verbs tend to have a subject, a direct object, and up to one indirect object, and typically not more than that.

This entry was edited (1 month ago)
in reply to Ben Weinstein-Raun

idk, I'm not sure this actually makes sense; the real answer might just be "ultrafinite induction"
in reply to Ben Weinstein-Raun

Yeah I guess I'm stuck on "well why can't there be a bunch of relevant times".
in reply to Daniel Filan

Also FWIW I'm still stuck on the fact that however natural it is, I have a strong intuition that "I will have been going to eat" is grammatical in a way that "I will have been going to have eaten" is not.
in reply to Daniel Filan

my take is that arbitrary nesting is in some sense grammatical, but when interpreting things like this in the wild, I have to weigh up "they really mean the complicated thing" vs "they mean a simpler thing, but have said it incorrectly", and as the things become more complicated the latter explanation becomes more and more likely
This entry was edited (1 month ago)


MOAR AXRP


This time with Erik Jenner, on a paper he's presenting at NeurIPS tomorrow - check it out if you're there!

Lots of people in the AI safety space worry about models being able to make deliberate, multi-step plans. But can we already see this in existing neural nets? In this episode, I talk with Erik Jenner about his work looking at internal look-ahead within chess-playing neural networks.

Video
Transcript



Am now up to knowing five words for types of slave in Latin.


Jeroen Henneman, The Long Way Home
From: https://x.com/opancaro/status/186529216161008481


Wilhelm Kranz
From: https://x.com/0zmnds/status/1865291905249980735

#art

#art
This entry was edited (1 month ago)

Visual Arts Feed reshared this.




Gustave Doré
From: https://x.com/0zmnds/status/1863475184344174739
#art
#art

reshared this



Misc notes on Latin learning


in reply to Daniel Filan

also it's kinda wild that in chapter 20 of the companion book the teacher is complaining to his slave how much his right arm hurts from beating his students. his solution to the pain? day drinking.
This entry was edited (1 month ago)
in reply to Daniel Filan

then he has a conversation about how he sucks at teaching and should just give up and live on enough money to buy bread and books

in reply to Daniel Filan

Also, if you have a "national parks passport", bring it! You can get it stamped at the end, which is Land's End, part of the Golden Gate National Recreational Area.


Victo Ngai
From: https://x.com/opancaro/status/1863111407962599592
#art
#art

Visual Arts Feed reshared this.



New AXRP! With Evan Hubinger!


This time I won't retract it, I swear!

The 'model organisms of misalignment' line of research creates AI models that exhibit various types of misalignment, and studies them to try to understand how the misalignment occurs and whether it can be somehow removed. In this episode, Evan Hubinger talks about two papers he's worked on at Anthropic under this agenda: "Sleeper Agents" and "Sycophancy to Subterfuge".

Video
Transcript

This entry was edited (1 month ago)
in reply to Daniel Filan

I like how it looks like the AXRP logo is the sun in this thumbnail.


I actually like it when YouTube waits a while to start processing the video I just uploaded. It strengthens my character.



Messed up that Latin became the language of the intelligentsia in the middle ages and therefore has more pedagogical materials available now, when Greek has classical authors you obviously should care more about. Like, it has the philosophers! Not to mention the New Testament (and the version of the Hebrew Bible that the authors of the New Testament were familiar with), the Iliad and the Odyssey, and Greek myths (let's be real nobody cares more about Roman myths than Greek myths). Yes, it's cool that Latin has De Rerum Natura, Apuleius, and Cato, and the tradition of scholarship is a nice bonus. But c'mon!
in reply to Daniel Filan

I further guess this is a cautionary tale about the tradeoff between writing and conquering.
This entry was edited (1 month ago)
in reply to Daniel Filan

Tbf I think the Romans owned this, they were like 'the Greeks theorize, we get shit done'
in reply to Amber Dawn

It's like the LessWrongers and the EAs. As a LWer myself, I know where my sympathies lie...
in reply to Daniel Filan

LW: democratic and philosophical but also factious and discourse-ridden
EA: run by 1 or 2 extremely powerful guys who sometimes turn out to be deranged and corrupt. A woman called Julia is also involved.
in reply to Daniel Filan

Counter-argument: the point of learning an ancient language is to read the poetry not the prose (since prose is easily translated) and Latin poetry is plausibly better than Greek poetry.


Latin practice day 7


These aren't very inspired but:

I. Cūr quaeque littera Graeca pulchrior est quam quaeque littera Latīna?
II. Sī linguam Latīnam scīre vult, quotiēs quamque litteram Latīnam scrībere necesse est?
III. Vōlōne ā magistrō laudārī?
IV. In Capitulō XVI, quia Dominus Iēsus tempestātem facit apud navem Lydiae? Lydia ā Deō dīligiturne?
V. Num medicus labōrans vērē sanat hominēs aegrōs?
VI. Num parēntēs laudant magister discipulōs verberāntem?
VII. Suntne bēstiolae industriorēs quam apēs? Quid facit illae?
VIII. Quia dea est pulcherrima?
IX. Hōdiē, quae bonae rēs daminī ā deī?

#latinpractice

This entry was edited (1 month ago)
in reply to Daniel Filan

Hōdiē sum in domō parentum matris mea, in Arizonā. In hāc domō, saepe dormō in lectō parvō in cubiculō parvō, sed hōdiē habeō magnum cubiculum ac magnum lectum. Cēnābam cum parentibus matris meus, et cum amīcīs suīs. Aliī hominēs ēdēbant magnam avem, sed ego edēbam botulōs quī ex holeribus fīunt, nam Pythagoricus sum. Cōnspiciēbāmus pēs-pilam (harpastum? calcifollem? I guess Vicipaedia uses "Harpastum") - Leōnēs Detroitī, quī amantur ā parentēs matris meus, vincēbant contra Ursōs Sicāgoensis!

(I only know the imperfect past tense, forgive me)

This entry was edited (1 month ago)


Every country in the world belongs to America


Shouldn't the US buy the Vatican?
- they're rapidly going bankrupt and could use the money
- Trump would go for it
- the US is the new Rome
- would bring the US tons of geopolitical power
- new place to station US troops without any restrictions
- probably will ensure all Americans go to heaven
- zero downsides

Am I missing something?????

in reply to Daniel Filan

Cheaper than it seems because likely individual Americans are going to bail them out anyway.
in reply to Daniel Filan

Currently listening to a podcast episode floating the idea of the Pope issuing a tax on all Catholics. America can fix this.


New episode with Jesse Hoogland!


Another short one, I'm afraid.

You may have heard of singular learning theory, and its "local learning coefficient", or LLC - but have you heard of the refined LLC? In this episode, I chat with Jesse Hoogland about his work on SLT, and using the refined LLC to find a new circuit in language models.

YouTube
Transcript



Lieke van der Vorst
From: https://x.com/marysia_cc/status/1861148591479288294/photo/1

-

Elena and Anna Balbusso
for Little Knife by Leigh Bardugo
From: https://x.com/marysia_cc/status/1861127999581528531/photo/1

#art

#art


Franz Karl Leopold von Klenze
From: https://x.com/0zmnds/status/1861121676735586756/photo/1

-

Chesley Knight Bonestell, Jr.
From: https://x.com/0zmnds/status/1861297334195495170/photo/1

#art

#art


IMO it's kind of weird that there aren't more blog posts in the rationality-sphere about how to do group house living well. There are a bunch of tricky problems that need solving and opportunities for clever solutions that make people better off, so you'd think there would be much fodder. Possibilities:

  • Maybe people just don't think about it very much?
  • "Group house living" isn't as culturally salient a category as "parenting", so we're not used to writing about it?
  • Most of the problems involve being kind of annoyed at specific people, and so are inherently awkward to talk about?
in reply to Daniel Filan

in reply to David Mears

Yeah reading this I was like 'wow a lot of our lore is about chores'. I guess because this came up as an issue with us, whereas 'there are conflicts/annoyances with the other people' hasn't come up as much, possibly because two of the relationships were selected specifically for not being mutually annoying :p (and luckily you and Ben seem to not annoy each other that much)

Maybe the main tip is 'try to select people you really vibe with/share living preferences with', and if you manage that you will be well-placed to either not have problems (because your preferences don't clash), or to solve them?



I think a cool religious injunction / OCD symptom would be not being allowed to pass thru doors that other people open. You'd have to have fun rules about automatic doors that would result in you learning much more about them than most of us do. It also has cool symbolism.


This seems like a pretty thin market for a pretty important question!


Hideo Takeda
From: https://x.com/opancaro/status/1859473265149776148


Kinda sad that the easiest genuine Latin for intermediate learners to read is literal dictatorial propaganda glorifying aggressive war.


Not loving that YouTube is congratulating me on becoming an agent of addiction


Franz Caucig
From: https://x.com/0zmnds/status/1858558034307674338


Made a brief podcast episode about my experience learning latin: youtu.be/owF5Fo43-qU