Skip to main content


things it's the anniversary of this weekend:
- covid lockdowns
- the final alphago match
- the end of HPMOR
- the killing of Julius Caesar


Johann Friedrich Morgenstern
From https://x.com/0zmnds/status/1890834305208696987
#art
#art


Yet another short AXRP episode!


With Anthony Aguirre!

The Future of Life Institute is one of the oldest and most prominant organizations in the AI existential safety space, working on such topics as the AI pause open letter and how the EU AI Act can be improved. Metaculus is one of the premier forecasting sites on the internet. Behind both of them lie one man: Anthony Aguirre, who I talk with in this episode.

Video
Transcript



Are fans actually white noise machines? If so, why? It seems like they're the sort of thing that has an obvious frequency that would matter so I'm not sure what's going on. Maybe that the air gets routed thru grates and stuff and that creates the white noise?
in reply to Daniel Filan

I guess they sort of obviously don't have the high pitch components that real white noise machines do.


Does anyone know of a laptop price index where I can check if it spiked today?


Anyone have recommendations for TV I should watch tonight? I'm most interested in strategy shows, e.g. if The Traitors were on Netflix that would be my top pick.
in reply to Daniel Filan

UK Traitors with VPN? (unless you've already watched it)


At its best, the local YMCA steam room is better than the one at Archimedes Banya - admittedly no eucalyptus scent, but bigger and hotter.
in reply to Daniel Filan

"at its best" because sometimes you go in and it's just not that steamy for some reason. Today I went and the lights didn't work but the steam was on full blast 👌


Back in the day people used to argue whether effective altruism was an opportunity or an obligation. It's just occurred to me that the opportunity side has links to theodicies that I find pretty implausible - "oh we're so lucky that there's so much pointless suffering in the world, so that we have the opportunity to do something about it".


More AXRP! Joel Lehman!


Typically this podcast talks about how to avert destruction from AI. But what would it take to ensure AI promotes human flourishing as well as it can? Is alignment to individuals enough, and if not, where do we go form here? In this episode, I talk with Joel Lehman about these questions.

Video
Transcript



Misty morning at Lanhydrock. Cornwall, England. NMP
From: https://x.com/HoganSOG/status/1882211656283111582/photo/1

#art

#art


Junichiro Sekino 1914-1988
Night in Kyoto
#art

From: https://x.com/marysia_cc/status/1882215670282166390

#art
This entry was edited (2 months ago)


The daytime moon is pretty cool IMO. Somehow a more vivid reminder that we're floating in (hurtling thru?) space.
in reply to Daniel Filan

sometimes I look at it and I'm like, man, it would be REALLY bad if that fell down, but it doesn't, somehow


Tanaka Ryōhei (1933-2019)
Crow and Persimmon in the Snow

From: https://x.com/marysia_cc/status/1881097630148907230/photo/1

#art

#art


Miscellaneous life updates




Adria on AXRP!


Yet another new episode!

Suppose we're worried about AIs engaging in long-term plans that they don't tell us about. If we were to peek inside their brains, what should we look for to check whether this was happening? In this episode Adrià Garriga-Alonso talks about his work trying to answer this question.

Transcript
Video

in reply to Daniel Filan

Yesss got a lot of views on this one - I think I successfully managed to create a clickbait thumbnail.


OK this is probably a dumb question but why did we all decide that bio risk was the most scary thing AIs could do? Did someone write up a justification of that somewhere?
in reply to Daniel Filan

I think it's maybe the scariest thing if you only believe in misuse risk, and lots of people seem to only believe in misuse risk for some reason.
in reply to Daniel Filan

I think what we decided was more like: it might be the *first* way we get an actual catastrophe

in reply to Daniel Filan

sounds like it was intended as a joke; like, linking to "pleonastically" is an illustration of pleonasticism.


Happy New AXRP!


Yet another in the Alignment Workshop series.

AI researchers often complain about the poor coverage of their work in the news media. But why is this happening, and how can it be fixed? In this episode, I speak with Shakeel Hashim about the resource constraints facing AI journalism, the disconnect between journalists' and AI researchers' views on transformative AI, and efforts to improve the state of AI journalism, such as Tarbell and Shakeel's newsletter, Transformer.

Transcript
Video



The worst thing about studying Latin is that it's invaded my mind to the degree that I now almost feel like having five cases is reasonable.
in reply to Daniel Filan

"why not more? Why not make a dedicated instrumental case, or a locative that isn't a sewn-together monstrosity comprised by other cases?" - the sounds of a mind deranged by synthetic languages
This entry was edited (2 months ago)
in reply to Daniel Filan

Russian has six! Also the rules for declension in Russian depend on, among other things, whether something is animate. Are dolls animate? (Yes.) Are corpses? (Depends which word you're using.) Are bacteria? (Yes if you're a biologist, probably not otherwise.)


I suspect that "epistemic and instrumental rationality" is better branded and lived as "nobility in thought and deed". But maybe I just have an unusual set of associations with the word "noble"? It's certainly more goal-laden than the word "rational" typically is.
in reply to Daniel Filan

The thing I mean is less altruistic than what David Chapman describes on this page but shares the feature of being valuable and possible.


So here's a dumb question about Jason Gross-style work on compact proofs that I don't want to ask totally publicly - what's the point? I see the value in making the case for interp as being for stuff like compact proofs. But I feel like we know that we aren't going to be able to find literal proofs of relevant safety properties of GPT-4, and we don't even know what those properties should be. So relevant next steps should look like "figure out heuristic arguments" and "figure out WTF AI safety even is" right? So why do more work getting compact proofs of various model properties?
in reply to Daniel Filan

I don't think it's obvious that we can't get proofs of any relevant safety properties. Like, yeah we're not going to get proofs of anything that references human preferences or whatever, but there might be relevant limited subquestions, e.g. about information capacity or something?
in reply to Ben Weinstein-Raun

I guess I just mean that it's really hard to prove anything about big NN behaviour - my understanding is if you try really hard you can do interval propagation in a smart way but that's about it.
This entry was edited (2 months ago)


A question bopping around my mind: are there things like making AXRP or being a MATS RM that I could do instead of those things that would be more valuable? Possible answers:
- just do research that matters
- project manager at a place that does research that matters
- be more directly a competitor to Zvi
- team up with Lawrence Chan and write stuff about various alignment schemes

I think a bottleneck I feel is being unsure about what things are valuable in the info environment, where I think I'm best placed to do stuff.




So like.... what's so good about trains? Why would someone think they are so much cooler than cars / trucks / aeroplanes?
This entry was edited (3 months ago)
in reply to Daniel Filan

  • Bigger / heavier
  • Stronger / move more stuff
  • Make way better sounds
in reply to Daniel Filan

The infrastructure is somehow really appealing (rails, railroad switches, signals). And there's something great about the way they glide along the track.


Thing I just learned: the author of Paul: a Very Short Introduction, one of my favourite entries in the Very Short Introduction series and one I frequently recommend, is written by E. P. Sanders - one of the most prominent 20th century scholars on the apostle Paul and his thought. Self-recommending!
This entry was edited (3 months ago)
in reply to Daniel Filan

It really says something about where I'm at today, that it took multiple seconds before I realized you weren't talking about Paul Christiano.
in reply to Ben Weinstein-Raun

RIP I included 'apostle' or something in an earlier draft of this explicitly to counteract this, but randomly left it out of the final version. Fixing it.


perfect past tense of "incipere" is "coepisse". wtf.
This entry was edited (3 months ago)
in reply to Daniel Filan

In general on one hand I'm like "I'm so grateful English has so much grammar and vocabulary to make it so expressive" but when I see Latin I'm like "Japanese copes with just having past past and non-past plus some participles, why can't you" (not even getting to the whole thing of having different genders and different declensions for nouns and adjectives).
in reply to Daniel Filan

Ironically "coep-" is now the perfect stem I am perhaps least likely to forget.


Solstice notes


  • I like that the celebration took place on (or adjacent to) the actual solstice
  • I broadly thought this year's was worse than last year's, altho it had its charms
  • I liked "Humankind as a sailor" - tricky to pick up but rewarding once you did
  • Just because a song takes place in Australia, I don't think it thereby glorifies the negative aspects of colonialism.
  • The darkness speech was touching this year
  • I feel like a lot of the time the speaker would say something I straightforwardly agreed with in the way I would say it and everyone would laugh.
  • It was funny when Ozy said her favourite website was Our World in Data and Scott sang the praises of Dustin Moskowitz while I was sitting next to Oli
  • I think "the world is awful" is wrong, and not established by there being awful things in the world.
in reply to Daniel Filan

Also 'Humankind as a Sailor' is now on my non-core solstice music playlist and so popped up while I was on the rowing machine - total disaster, induced complete muscle confusion.



I really like how smooth and clean this retention curve is - this is for my episode with Evan Hubinger, the height of the line is what fraction of viewers are still watching at any given time.


How much nesting can we do in English verb tenses, and what controls that? For an example of what I mean, I can say:
- I eat
- I will eat
- I will have been eating
- I will have been going to eat

But I don't think we can say "I will have been going to have eaten".

in reply to Daniel Filan

One possibility: basically it goes as far as it makes sense to add extra timing information. But this only works if you disagree about your last positive example, which I personally don't actually think I've ever heard used.

Like, imagine a timeline. "I eat" describes a period of time encompassing now. "I will eat" describes a period of time in the future. "I will have eaten" describes two times; one in the future and one in the past of that future. "I will have been going to eat" describes a time in the future, a time in the past of that future, and a time in the future of that past of the first future. But in some sense this collapses back to the semantic content of "I will eat", and so my guess is that it's basically never used.

in reply to Ben Weinstein-Raun

Or, maybe I think your last positive example is sometimes acceptable, but only if the "going to" is actually describing an intention rather than tense information.
in reply to Ben Weinstein-Raun

I guess I don't get why it makes sense to talk about two times but not three.
in reply to Daniel Filan

I think what I mean is that additional times around the loop aren't really adding any extra information, because they introduce new reference points along the timeline that typically don't connect to anything else.

Like, there's some implicit time T that I'm trying to locate with a given statement, and there's an additional time Now that I get from just being in the present.

It makes sense to be like "Some time between Now and [implicitly / contextually defined] T, X will happen", and this is ~ the two-level wrapping. But if you say "Some time between Now and [newly introduced / 'bound' / 'scoped-to-this-statement'] T1, it will be the case that X happened after [implicit / 'free' / contextual] T2", T1 is kind of irrelevant, since it's introduced and used only within the statement.

In principle I guess you could have extra context that disambiguates, but I think it's also kinda relevant that verbs tend to have a subject, a direct object, and up to one indirect object, and typically not more than that.

This entry was edited (3 months ago)
in reply to Ben Weinstein-Raun

idk, I'm not sure this actually makes sense; the real answer might just be "ultrafinite induction"
in reply to Ben Weinstein-Raun

Yeah I guess I'm stuck on "well why can't there be a bunch of relevant times".
in reply to Daniel Filan

Also FWIW I'm still stuck on the fact that however natural it is, I have a strong intuition that "I will have been going to eat" is grammatical in a way that "I will have been going to have eaten" is not.
in reply to Daniel Filan

my take is that arbitrary nesting is in some sense grammatical, but when interpreting things like this in the wild, I have to weigh up "they really mean the complicated thing" vs "they mean a simpler thing, but have said it incorrectly", and as the things become more complicated the latter explanation becomes more and more likely
This entry was edited (3 months ago)


MOAR AXRP


This time with Erik Jenner, on a paper he's presenting at NeurIPS tomorrow - check it out if you're there!

Lots of people in the AI safety space worry about models being able to make deliberate, multi-step plans. But can we already see this in existing neural nets? In this episode, I talk with Erik Jenner about his work looking at internal look-ahead within chess-playing neural networks.

Video
Transcript



Am now up to knowing five words for types of slave in Latin.


Jeroen Henneman, The Long Way Home
From: https://x.com/opancaro/status/186529216161008481


Wilhelm Kranz
From: https://x.com/0zmnds/status/1865291905249980735

#art

#art
This entry was edited (3 months ago)



Gustave Doré
From: https://x.com/0zmnds/status/1863475184344174739
#art
#art

Lambda reshared this.



Misc notes on Latin learning


in reply to Daniel Filan

also it's kinda wild that in chapter 20 of the companion book the teacher is complaining to his slave how much his right arm hurts from beating his students. his solution to the pain? day drinking.
This entry was edited (3 months ago)
in reply to Daniel Filan

then he has a conversation about how he sucks at teaching and should just give up and live on enough money to buy bread and books

in reply to Daniel Filan

Also, if you have a "national parks passport", bring it! You can get it stamped at the end, which is Land's End, part of the Golden Gate National Recreational Area.


Victo Ngai
From: https://x.com/opancaro/status/1863111407962599592
#art
#art


New AXRP! With Evan Hubinger!


This time I won't retract it, I swear!

The 'model organisms of misalignment' line of research creates AI models that exhibit various types of misalignment, and studies them to try to understand how the misalignment occurs and whether it can be somehow removed. In this episode, Evan Hubinger talks about two papers he's worked on at Anthropic under this agenda: "Sleeper Agents" and "Sycophancy to Subterfuge".

Video
Transcript

This entry was edited (3 months ago)
in reply to Daniel Filan

I like how it looks like the AXRP logo is the sun in this thumbnail.


I actually like it when YouTube waits a while to start processing the video I just uploaded. It strengthens my character.