Superstimulus | Daniel Filan @ Superstimulus

Daniel Filan

sloeb@superstimul.us

Current deal:
- Research Manage at MATS
- Podcast at AXRP
- Hobby is learning Latin
- Single

Berkeley, California, USA

In this episode, I chat with Samuel Albanie about the Google DeepMind paper he co-authored called "An Approach to Technical AGI Safety and Security". It covers the assumptions made by the approach, as well as the types of mitigations it outlines.

Video
Transcript

Daniel Filan

1 month ago •

Daniel Filan
1 month ago •

New AXRP with Peter Salib!

In this episode, I talk with Peter Salib about his paper "AI Rights for Human Safety", arguing that giving AIs the right to contract, hold property, and sue people will reduce the risk of their trying to attack humanity and take over. He also tells me how law reviews work, in the face of my incredulity.

Video
Transcript

Ben Weinstein-Raun likes this.

in reply to Daniel Filan

Ben Weinstein-Raun

in reply to Daniel Filan • 1 month ago • •

Your thumbnail game is getting better

in reply to Ben Weinstein-Raun

Daniel Filan

in reply to Ben Weinstein-Raun • 1 month ago •

Thanks! I tried making the text bigger and adding more question marks.

Ben Weinstein-Raun likes this.

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 1 month ago •

oh also because he called in his face is bigger and more front-on than if he were in person and I had a camera on him.

This entry was edited (1 month ago)

Ben Weinstein-Raun likes this.

in reply to Ben Weinstein-Raun

Daniel Filan

in reply to Ben Weinstein-Raun • 1 month ago •

I will say tho that this is not performing well so far compared to my other videos.

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 1 month ago •

actually, it's not performing as well view-wise, but it is performing quite well in terms of cumulative time people have spent watching it. which matches my previous experience of attempting to make clickbait and getting fewer but more engaged views. maybe the 'clickbait' stuff is actually just a good description of what's happening in the interview?

Ben Weinstein-Raun likes this.

Daniel Filan

1 month ago •

Daniel Filan
1 month ago •

New AXRP with David Lindner!

In this episode, I talk with David Lindner about Myopic Optimization with Non-myopic Approval, or MONA, which attempts to address (multi-step) reward hacking by myopically optimizing actions against a human's sense of whether those actions are generally good. Does this work? Can we get smarter-than-human AI this way? How does this compare to approaches like conservativism? Listen to find out.

Video
Transcript

Daniel Filan

2 months ago •

Daniel Filan
2 months ago •

Owain on AXRP!!!

Earlier this year, the paper "Emergent Misalignment" made the rounds on AI x-risk social media for seemingly showing LLMs generalizing from 'misaligned' training data of insecure code to acting comically evil in response to innocuous questions. In this episode, I chat with one of the authors of that paper, Owain Evans, about that research as well as other work he's done to understand the psychology of large language models.

Video
Transcript

Ben Weinstein-Raun likes this.

Daniel Filan

2 months ago •

Daniel Filan
2 months ago •

New AXRP episode with Lee Sharkey!

What's the next step forward in interpretability? In this episode, I chat with Lee Sharkey about his proposal for detecting computational mechanisms within neural networks: Attribution-based Parameter Decomposition, or APD for short.

Video
Transcript

Daniel Filan

3 months ago •

Daniel Filan
3 months ago •

I think the thing that makes (much) Latin poetry most unrewarding for me is that I'm not at the stage where I can appreciate the rhythm and the meaning at the same time; I have to focus on one or the other, and in isolation neither is so great.

This entry was edited (3 months ago)

like this

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 3 months ago •

Of course presumably this is a skill issue to be rectified in time.

Daniel Filan

3 months ago •

Daniel Filan
3 months ago •

C'mon Fabulae Syrae, this is not a very good explanation of what a harundō is.

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 3 months ago •

OK I think I got it right anyway but what's up with the hat?

Ben Weinstein-Raun likes this.

Daniel Filan

3 months ago •

Daniel Filan
3 months ago •

I kinda miss using and checking Superstimulus so here's a Latin update:

I've finished the substantive part of my Latin textbook, and am now on the chapter about how poetry works. It's cool that I'm done with all the grammar, altho realistically I'm not sure I remember it all (e.g. there's something weird going on with 3rd declension adjectives and/or neuter nouns IIRC), so probably I should have another read-over at some point.

I'm currently reading the supplement, Fabulae Syrae, and it's pretty hard this chapter because it has a lot of verse. Should I keep slogging? IDK maybe - especially because I think the difficult verse was concentrated in a bit I'm now past? Also it's sort of good to challenge oneself? But I sort of worry that I'm being pig-headed and should just switch to something easier (e.g. "Caesar the Ethnographer", a cool tiered reader of parts of On the Gallic War that I recently got). But IDK presumably poetry is the one area where knowing the original language gives you the greatest advantage, so who knows.

I now "get" the appeal of dactylic hexam

Ben Weinstein-Raun likes this.

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 3 months ago •

Interestingly actual Ovid seems easier to understand than the verses the textbook author wrote himself, but maybe it's because he selected the bits of Ovid to be easy to understand vs wrote a whole story in his own verse.

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 3 months ago •

or idk maybe it's working and I'm becoming stronger?????

Ben Weinstein-Raun likes this.

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 3 months ago •

OK the next part was hard again so I think I'm not just becoming stronger.

Daniel Filan

4 months ago •

Daniel Filan
4 months ago •

AXRP Jason Gross

How do we figure out whether interpretability is doing its job? One way is to see if it helps us prove things about models that we care about knowing. In this episode, I speak with Jason Gross about his agenda to benchmark interpretability in this way, and his exploration of the intersection of proofs and modern machine learning.

Transcript
YouTube

Ben Weinstein-Raun likes this.

Daniel Filan

4 months ago •

Daniel Filan
4 months ago •

things it's the anniversary of this weekend:
- covid lockdowns
- the final alphago match
- the end of HPMOR
- the killing of Julius Caesar

Daniel Filan

5 months ago •

Daniel Filan
5 months ago •

Johann Friedrich Morgenstern
From https://x.com/0zmnds/status/1890834305208696987
#art

#art

Firehorseart lives! likes this.

Daniel Filan

6 months ago •

Daniel Filan
6 months ago •

Yet another short AXRP episode!

With Anthony Aguirre!

The Future of Life Institute is one of the oldest and most prominant organizations in the AI existential safety space, working on such topics as the AI pause open letter and how the EU AI Act can be improved. Metaculus is one of the premier forecasting sites on the internet. Behind both of them lie one man: Anthony Aguirre, who I talk with in this episode.

Video
Transcript

Daniel Filan

6 months ago •

Daniel Filan
6 months ago •

Are fans actually white noise machines? If so, why? It seems like they're the sort of thing that has an obvious frequency that would matter so I'm not sure what's going on. Maybe that the air gets routed thru grates and stuff and that creates the white noise?

Ben Weinstein-Raun likes this.

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 6 months ago •

I guess they sort of obviously don't have the high pitch components that real white noise machines do.

Daniel Filan

6 months ago •

Daniel Filan
6 months ago •

Does anyone know of a laptop price index where I can check if it spiked today?

like this

Daniel Filan

6 months ago •

Daniel Filan
6 months ago •

Anyone have recommendations for TV I should watch tonight? I'm most interested in strategy shows, e.g. if The Traitors were on Netflix that would be my top pick.

in reply to Daniel Filan

Amber Dawn

in reply to Daniel Filan • 6 months ago • •

UK Traitors with VPN? (unless you've already watched it)

Daniel Filan likes this.

Daniel Filan

6 months ago •

Daniel Filan
6 months ago •

At its best, the local YMCA steam room is better than the one at Archimedes Banya - admittedly no eucalyptus scent, but bigger and hotter.

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 6 months ago •

"at its best" because sometimes you go in and it's just not that steamy for some reason. Today I went and the lights didn't work but the steam was on full blast 👌

Daniel Filan

6 months ago •

Daniel Filan
6 months ago •

Back in the day people used to argue whether effective altruism was an opportunity or an obligation. It's just occurred to me that the opportunity side has links to theodicies that I find pretty implausible - "oh we're so lucky that there's so much pointless suffering in the world, so that we have the opportunity to do something about it".

like this

Daniel Filan

6 months ago •

Daniel Filan
6 months ago •

More AXRP! Joel Lehman!

Typically this podcast talks about how to avert destruction from AI. But what would it take to ensure AI promotes human flourishing as well as it can? Is alignment to individuals enough, and if not, where do we go form here? In this episode, I talk with Joel Lehman about these questions.

Video
Transcript

Daniel Filan

6 months ago •

Daniel Filan
6 months ago •

Misty morning at Lanhydrock. Cornwall, England. NMP
From: https://x.com/HoganSOG/status/1882211656283111582/photo/1

#art

#art

Miss Gayle likes this.

Daniel Filan

6 months ago •

Daniel Filan
6 months ago •

Junichiro Sekino 1914-1988
Night in Kyoto
#art

From: https://x.com/marysia_cc/status/1882215670282166390

#art

This entry was edited (6 months ago)

Daniel Filan

6 months ago •

Daniel Filan
6 months ago •

The daytime moon is pretty cool IMO. Somehow a more vivid reminder that we're floating in (hurtling thru?) space.

like this

in reply to Daniel Filan

Ben Millwood

in reply to Daniel Filan • 6 months ago • •

sometimes I look at it and I'm like, man, it would be REALLY bad if that fell down, but it doesn't, somehow

like this

Daniel Filan

6 months ago •

Daniel Filan
6 months ago •

Tanaka Ryōhei (1933-2019)
Crow and Persimmon in the Snow

From: https://x.com/marysia_cc/status/1881097630148907230/photo/1

#art

#art

Daniel Filan

6 months ago •

Daniel Filan
6 months ago •

Miscellaneous life updates

I'm noticing myself have less energy in my spare time now that the MATS cohort has started. The main way I notice this is that I feel too tired to do much Latin reading or study in the evening (and consequently my rate of going thru the textbook is much down, from 2 days per chapter to like 5 - altho that's partly due to the chapters growing longer and having more supplementary reading material).
I'm also starting to feel the AGI more. My biggest L of 2024 was that I didn't really think inference-time compute was going to be such a thing - my guess was that rumours from scaling labs were just hype, and noted that we didn't have strong scientific evidence that tons of CoT would work. It's hard for me to tell how this figures into my career - at the moment I think I fit pretty well into MATS+AXRP, but it's plausible that my balance should shift (and for sure both MATS's and AXRP's focuses will shift), and it's very plausible that I'm missing some impactful stuff to do. Things that make this tricky: I'm weirdly un-specialized at specific things (e.g. I'm not fantastic at

like this

Daniel Filan

6 months ago •

Daniel Filan
6 months ago •

Adria on AXRP!

Yet another new episode!

Suppose we're worried about AIs engaging in long-term plans that they don't tell us about. If we were to peek inside their brains, what should we look for to check whether this was happening? In this episode Adrià Garriga-Alonso talks about his work trying to answer this question.

Transcript
Video

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 6 months ago •

Yesss got a lot of views on this one - I think I successfully managed to create a clickbait thumbnail.

Daniel Filan

7 months ago •

Daniel Filan
7 months ago •

OK this is probably a dumb question but why did we all decide that bio risk was the most scary thing AIs could do? Did someone write up a justification of that somewhere?

Sam FM likes this.

in reply to Daniel Filan

Ben Weinstein-Raun

in reply to Daniel Filan • 6 months ago • •

I think it's maybe the scariest thing if you only believe in misuse risk, and lots of people seem to only believe in misuse risk for some reason.

in reply to Daniel Filan

Daniel Ziegler

in reply to Daniel Filan • 6 months ago • •

I think what we decided was more like: it might be the *first* way we get an actual catastrophe

Ben Weinstein-Raun likes this.

Daniel Filan

7 months ago •

Daniel Filan
7 months ago •

Thanks Wiktionary

like this

in reply to Daniel Filan

Ben Weinstein-Raun

in reply to Daniel Filan • 7 months ago • •

sounds like it was intended as a joke; like, linking to "pleonastically" is an illustration of pleonasticism.

in reply to Ben Weinstein-Raun

Daniel Filan

in reply to Ben Weinstein-Raun • 7 months ago •

one can only hope...

Daniel Filan

7 months ago •

Daniel Filan
7 months ago •

Happy New AXRP!

Yet another in the Alignment Workshop series.

AI researchers often complain about the poor coverage of their work in the news media. But why is this happening, and how can it be fixed? In this episode, I speak with Shakeel Hashim about the resource constraints facing AI journalism, the disconnect between journalists' and AI researchers' views on transformative AI, and efforts to improve the state of AI journalism, such as Tarbell and Shakeel's newsletter, Transformer.

Transcript
Video

Ben Weinstein-Raun likes this.

Daniel Filan

7 months ago •

Daniel Filan
7 months ago •

The worst thing about studying Latin is that it's invaded my mind to the degree that I now almost feel like having five cases is reasonable.

Ben Weinstein-Raun likes this.

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 7 months ago •

"why not more? Why not make a dedicated instrumental case, or a locative that isn't a sewn-together monstrosity comprised by other cases?" - the sounds of a mind deranged by synthetic languages

This entry was edited (7 months ago)

Ben Weinstein-Raun likes this.

in reply to Daniel Filan

Kevin Gibbons

in reply to Daniel Filan • 7 months ago • •

Russian has six! Also the rules for declension in Russian depend on, among other things, whether something is animate. Are dolls animate? (Yes.) Are corpses? (Depends which word you're using.) Are bacteria? (Yes if you're a biologist, probably not otherwise.)

like this

Daniel Filan

7 months ago •

Daniel Filan
7 months ago •

I suspect that "epistemic and instrumental rationality" is better branded and lived as "nobility in thought and deed". But maybe I just have an unusual set of associations with the word "noble"? It's certainly more goal-laden than the word "rational" typically is.

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 7 months ago •

The thing I mean is less altruistic than what David Chapman describes on this page but shares the feature of being valuable and possible.

Daniel Filan

7 months ago •

Daniel Filan
7 months ago •

So here's a dumb question about Jason Gross-style work on compact proofs that I don't want to ask totally publicly - what's the point? I see the value in making the case for interp as being for stuff like compact proofs. But I feel like we know that we aren't going to be able to find literal proofs of relevant safety properties of GPT-4, and we don't even know what those properties should be. So relevant next steps should look like "figure out heuristic arguments" and "figure out WTF AI safety even is" right? So why do more work getting compact proofs of various model properties?

like this

in reply to Daniel Filan

Ben Weinstein-Raun

in reply to Daniel Filan • 7 months ago • •

I don't think it's obvious that we can't get proofs of any relevant safety properties. Like, yeah we're not going to get proofs of anything that references human preferences or whatever, but there might be relevant limited subquestions, e.g. about information capacity or something?

Daniel Filan likes this.

in reply to Ben Weinstein-Raun

Daniel Filan

in reply to Ben Weinstein-Raun • 7 months ago •

I guess I just mean that it's really hard to prove anything about big NN behaviour - my understanding is if you try really hard you can do interval propagation in a smart way but that's about it.

This entry was edited (7 months ago)

Ben Weinstein-Raun likes this.

Daniel Filan

7 months ago •

Daniel Filan
7 months ago •

A question bopping around my mind: are there things like making AXRP or being a MATS RM that I could do instead of those things that would be more valuable? Possible answers:
- just do research that matters
- project manager at a place that does research that matters
- be more directly a competitor to Zvi
- team up with Lawrence Chan and write stuff about various alignment schemes

I think a bottleneck I feel is being unsure about what things are valuable in the info environment, where I think I'm best placed to do stuff.

like this

Ben Weinstein-Raun reshared this.

Daniel Filan

7 months ago • •

Daniel Filan
7 months ago • •

Merry Christmas superstimulus!

like this

Daniel Filan

7 months ago •

Daniel Filan
7 months ago •

So like.... what's so good about trains? Why would someone think they are so much cooler than cars / trucks / aeroplanes?

This entry was edited (7 months ago)

in reply to Daniel Filan

Ben Weinstein-Raun

in reply to Daniel Filan • 7 months ago • •

Bigger / heavier
Stronger / move more stuff
Make way better sounds

like this

in reply to Daniel Filan

Daniel Ziegler

in reply to Daniel Filan • 7 months ago • •

The infrastructure is somehow really appealing (rails, railroad switches, signals). And there's something great about the way they glide along the track.

Ben Weinstein-Raun likes this.

Daniel Filan

7 months ago •

Daniel Filan
7 months ago •

Thing I just learned: the author of Paul: a Very Short Introduction, one of my favourite entries in the Very Short Introduction series and one I frequently recommend, is written by E. P. Sanders - one of the most prominent 20th century scholars on the apostle Paul and his thought. Self-recommending!

This entry was edited (7 months ago)

in reply to Daniel Filan

Ben Weinstein-Raun

in reply to Daniel Filan • 7 months ago • •

It really says something about where I'm at today, that it took multiple seconds before I realized you weren't talking about Paul Christiano.

Ben Millwood likes this.

in reply to Ben Weinstein-Raun

Daniel Filan

in reply to Ben Weinstein-Raun • 7 months ago •

RIP I included 'apostle' or something in an earlier draft of this explicitly to counteract this, but randomly left it out of the final version. Fixing it.

in reply to Ben Weinstein-Raun

Ben Millwood

in reply to Ben Weinstein-Raun • 7 months ago • •

do we need to write Ben: a Very Short Introduction

like this

Daniel Filan

7 months ago •

Daniel Filan
7 months ago •

perfect past tense of "incipere" is "coepisse". wtf.

This entry was edited (7 months ago)

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 7 months ago •

In general on one hand I'm like "I'm so grateful English has so much grammar and vocabulary to make it so expressive" but when I see Latin I'm like "Japanese copes with just having past past and non-past plus some participles, why can't you" (not even getting to the whole thing of having different genders and different declensions for nouns and adjectives).

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 7 months ago •

Ironically "coep-" is now the perfect stem I am perhaps least likely to forget.

Daniel Filan

7 months ago •

Daniel Filan
7 months ago •

Solstice notes

I like that the celebration took place on (or adjacent to) the actual solstice
I broadly thought this year's was worse than last year's, altho it had its charms
I liked "Humankind as a sailor" - tricky to pick up but rewarding once you did
Just because a song takes place in Australia, I don't think it thereby glorifies the negative aspects of colonialism.
The darkness speech was touching this year
I feel like a lot of the time the speaker would say something I straightforwardly agreed with in the way I would say it and everyone would laugh.
It was funny when Ozy said her favourite website was Our World in Data and Scott sang the praises of Dustin Moskowitz while I was sitting next to Oli
I think "the world is awful" is wrong, and not established by there being awful things in the world.

like this

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 7 months ago •

Also 'Humankind as a Sailor' is now on my non-core solstice music playlist and so popped up while I was on the rowing machine - total disaster, induced complete muscle confusion.

Daniel Filan reshared this.

dynomight

7 months ago • •

dynomight
7 months ago • •

Things to argue about over the holidays instead of politics III
dynomight.net/arguments-3/

Things to argue about over the holidays instead of politics III

report back on how it goes

^{dynomight (DYNOMIGHT)}

Daniel Filan likes this.

Daniel Filan reshared this.

Daniel Filan

7 months ago •

Daniel Filan
7 months ago •

I really like how smooth and clean this retention curve is - this is for my episode with Evan Hubinger, the height of the line is what fraction of viewers are still watching at any given time.

Ben Weinstein-Raun likes this.

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 7 months ago •

I do wish the line were a bit higher up tho...

like this

Daniel Filan

7 months ago •

Daniel Filan
7 months ago •

How much nesting can we do in English verb tenses, and what controls that? For an example of what I mean, I can say:
- I eat
- I will eat
- I will have been eating
- I will have been going to eat

But I don't think we can say "I will have been going to have eaten".

Ben Weinstein-Raun likes this.

in reply to Daniel Filan

Ben Weinstein-Raun

in reply to Daniel Filan • 7 months ago • •

One possibility: basically it goes as far as it makes sense to add extra timing information. But this only works if you disagree about your last positive example, which I personally don't actually think I've ever heard used.

Like, imagine a timeline. "I eat" describes a period of time encompassing now. "I will eat" describes a period of time in the future. "I will have eaten" describes two times; one in the future and one in the past of that future. "I will have been going to eat" describes a time in the future, a time in the past of that future, and a time in the future of that past of the first future. But in some sense this collapses back to the semantic content of "I will eat", and so my guess is that it's basically never used.

in reply to Ben Weinstein-Raun

Ben Weinstein-Raun

in reply to Ben Weinstein-Raun • 7 months ago • •

Or, maybe I think your last positive example is sometimes acceptable, but only if the "going to" is actually describing an intention rather than tense information.

Daniel Filan likes this.

in reply to Ben Weinstein-Raun

Daniel Filan

in reply to Ben Weinstein-Raun • 7 months ago •

I guess I don't get why it makes sense to talk about two times but not three.

Ben Weinstein-Raun likes this.

in reply to Daniel Filan

Ben Weinstein-Raun

in reply to Daniel Filan • 7 months ago • •

I think what I mean is that additional times around the loop aren't really adding any extra information, because they introduce new reference points along the timeline that typically don't connect to anything else.

Like, there's some implicit time T that I'm trying to locate with a given statement, and there's an additional time Now that I get from just being in the present.

It makes sense to be like "Some time between Now and [implicitly / contextually defined] T, X will happen", and this is ~ the two-level wrapping. But if you say "Some time between Now and [newly introduced / 'bound' / 'scoped-to-this-statement'] T1, it will be the case that X happened after [implicit / 'free' / contextual] T2", T1 is kind of irrelevant, since it's introduced and used only within the statement.

In principle I guess you could have extra context that disambiguates, but I think it's also kinda relevant that verbs tend to have a subject, a direct object, and up to one indirect object, and typically not more than that.

This entry was edited (7 months ago)

Daniel Filan likes this.

in reply to Ben Weinstein-Raun

Ben Weinstein-Raun

in reply to Ben Weinstein-Raun • 7 months ago • •

idk, I'm not sure this actually makes sense; the real answer might just be "ultrafinite induction"

Daniel Filan likes this.

in reply to Ben Weinstein-Raun

Daniel Filan

in reply to Ben Weinstein-Raun • 7 months ago •

Yeah I guess I'm stuck on "well why can't there be a bunch of relevant times".

Ben Weinstein-Raun likes this.

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 7 months ago •

Also FWIW I'm still stuck on the fact that however natural it is, I have a strong intuition that "I will have been going to eat" is grammatical in a way that "I will have been going to have eaten" is not.

Ben Weinstein-Raun likes this.

in reply to Daniel Filan

Ben Millwood

in reply to Daniel Filan • 7 months ago • •

my take is that arbitrary nesting is in some sense grammatical, but when interpreting things like this in the wild, I have to weigh up "they really mean the complicated thing" vs "they mean a simpler thing, but have said it incorrectly", and as the things become more complicated the latter explanation becomes more and more likely

This entry was edited (7 months ago)

Daniel Filan

7 months ago •

Daniel Filan
7 months ago •

MOAR AXRP

This time with Erik Jenner, on a paper he's presenting at NeurIPS tomorrow - check it out if you're there!

Lots of people in the AI safety space worry about models being able to make deliberate, multi-step plans. But can we already see this in existing neural nets? In this episode, I talk with Erik Jenner about his work looking at internal look-ahead within chess-playing neural networks.

Video
Transcript

Ben Weinstein-Raun likes this.

⇧