Skip to main content



Adria on AXRP!


Yet another new episode!

Suppose we're worried about AIs engaging in long-term plans that they don't tell us about. If we were to peek inside their brains, what should we look for to check whether this was happening? In this episode Adrià Garriga-Alonso talks about his work trying to answer this question.

Transcript
Video

in reply to Daniel Filan

Yesss got a lot of views on this one - I think I successfully managed to create a clickbait thumbnail.


6 days!


I'm probably starting my sixth day of feeling a lot more normal, health-wise. (Don't know for sure til later in the day.)

I might have returned to the level of health I was at in mid Dec — which was concerningly-bad at the time, but things got significantly worse after that.

While I'm feeling better, I'm trying to un-decondition my body a bit. Strategy: Walk a few minutes, get tired, and then put myself in a quiet/dark/controlled/alone environment to conduct Optimal Rest

in reply to kip

long may it continue! (or, dare we hope, improve still further :) )


I've been using a Kinesis Advantage 2 keyboard for a while, and I'm v fond of it, but one of the big drawbacks is that it's relatively bulky and difficult to transport. The newer model, the 360, seems like it would be easier, but it's kind of a lot of money to drop on something that only seems like it might solve a problem that I have. Curious if I know anyone who has one and wants to vouch for it. (Also curious for your views on whether I should get the Bluetooth one or cabled one -- my instinct would be the latter)
in reply to David Mears

the desire for portability is so that I can easily take it to and from work

I think for now I will try doing this with my existing keyboard but I suspect it will be annoying

in reply to Ben Millwood

Maybe i could give you my big keyboard that is the same as your big keyboard, so then you wouldn’t need to portable it around town (have one at work and one at home). I don’t use it. So you could have it for some token amount.


Meta has changed its policies, in some right-wing-y ways. Lots of people are disappointed. Could be a good time to try to get more people on Superstimulus?
in reply to kip

Maybe - I'm not currently using fb, so I'm not planning to do anything broad like that myself; also I'm not really following the things you're talking about beyond that they're maybe going to switch to a community notes style moderation scheme. I guess I saw some Zuckerberg quote about "masculine energy" or something?
in reply to Ben Weinstein-Raun

I'm not paying much attention to the changes either. But apparently one of the changes is that you're allowed to call people mentally ill if it's because they're trans? And someone I know is taking time off Facebook because of this. (I haven't fully factchecked if this is the right interpretation of the policy, but it does seem to be what the literal text is saying, and there are some articles reporting on it)
This entry was edited (5 months ago)


I've never heard of a financial advisor that will draw a rough curve of the client's utility as a function of money and then make investment recommendations on that basis -- which is pretty insane given how basic that should be. (I'm sure they sorta kinda do it implicitly but it's probably not very good)


A partial list of people whose art I've loved, and who I might have liked to be friends with, but who I think would not like me very much (all for different reasons):

  • Ursula K. LeGuin (I'm not MtG Green enough)
  • Ayn Rand (I'm too MtG Green)
  • Ezra Koenig (I'm too MtG Blue)

I'm not really sure how or why I generated this list. It feels related to the thing about wanting to get stronger, and deleting my facebook last month. It's kind of an "edge-y" question: I don't know how to emotionally deal with the existence of people in this category, but they go on existing.



Okay, where is the few-shot "reads literally all the article summaries from the whole internet and predicts how much I'd like them" service?
in reply to Ben Weinstein-Raun

I was starting to wonder where it was when GPT 3.5 came out, and now I'm really feeling like it's suspicious
in reply to Ben Weinstein-Raun

I also am really annoyed the "read all my notifications, alert me about the few important ones, and batch summarize the rest" service hasn't been built


An unfortunate thing for me is that I just viscerally really like LLMs, and would like them even more if they were way smarter.


OK this is probably a dumb question but why did we all decide that bio risk was the most scary thing AIs could do? Did someone write up a justification of that somewhere?
in reply to Daniel Filan

I think it's maybe the scariest thing if you only believe in misuse risk, and lots of people seem to only believe in misuse risk for some reason.
in reply to Daniel Filan

I think what we decided was more like: it might be the *first* way we get an actual catastrophe


in reply to Ben Weinstein-Raun

I feel like I'm living on the internet these days, since my health is too bad for in-person stuff. So it sucks that the internet is so much more aggro.

> My strategy so far in life has just been to avoid being the kind of person who attracts "sharp" / "angry" critics, and also to filter my social bubble to exclude them. But this doesn't scale if you're trying to do the things I'm trying to do.

What are you doing that is incompatible with filtering your social bubble?

in reply to kip

Mainly: Change what happens in the world on a pretty large scale, while not being Carl Shulman.
in reply to Ben Weinstein-Raun

Re: method "c": I'm wondering if you could intentionally give yourself exposure to critics in a way that's less vulnerable. The most obvious ways to do this might be too insincere for your taste, but idk, maybe there's still something you can do?

Like if I were trying to do this, I might create an anonymous account and intentionally share my most controversial (yet unimportant) thoughts/opinions, in places where some people will probably get mad at me. (Hopefully not in a way that, like, antagonizes people? I'd want it to be net good.) Then I'd try to lean into a mindset that getting criticism is a necessary/normal part of getting noticed.


in reply to Daniel Filan

sounds like it was intended as a joke; like, linking to "pleonastically" is an illustration of pleonasticism.


The raddest palindrome I have ever written


Rad! A ray, an otter, a man, a plan, a llama, snipes, a bat, a devil arisen (oh pox!); a sarong, a lug, a one-ton tub of unaksed-for pâté, bros, a POC prawn, a lioniser (ah, so ozone!); nets, ill Szymon (ox at an ER); wasps; a madam, a hard-won kiss (a 'snog'), a war, a call, a fret, a war (again!), a Niagara Waterfall, a car, a wagon's ass (I know, Dr.); a ham, a dam; asps, a wren, a taxonomy, ZSL (listen, Eno!) zoo; shares in oil, an warp cop, a sorbet; a Prof-desk, an UFO; but not, Eno, a gulag, nor a saxophone, sir; a live database; pins; a mall, an Alp, an amaretto - nay, a radar!


A mana pool, a ball... uh... a star, rats, a hullabaloo, panama


current progress on Bluesky OAuth login for my web app:

  • you give me your Bluesky handle
  • I have two methods of turning this into your DID, one via DNS and one via HTTP. I try both and use whichever works.
  • Now I have your DID, I need to go get your DID document. There's more than one type of DID, but so far I've only bothered to support one, which I just fetch from the directory.
  • Now I have your DID document, I can look up what your PDS is.
  • Now I have your PDS, I can ask it where the authorization servers are.
  • Now I've got an authorization server, I can ask it for the authorization server endpoints.
  • Now I think I can start the OAuth process?
  • It's not like the OAuth process is simple either

😵‍💫



I never want to go viral — well, unless I figure out how to become comfortable with getting a ton of mean comments from strangers. It seems like this happens basically no matter what you go viral for.

Like, the mob will be much more aggro if you're viral for something controversial. But even the most innocuous things will attract lots of mean comments, if enough people see it.

And people are biased to weight negative comments more highly, so I worry this has a rough psychological impact even if there's a lot more support than there is hate. (Unsure about this though. Maybe that's not the case. And maybe people who go viral are more ok hearing insults than average?)

in reply to kip

My childhood dream was to be a evangelical conservation biologist like Jeff Corwin or Steve Irwin but the cost of fame seemed to high even to 11 year old Tim


So I've been listening to Hadestown 2010. One of my favorites is Hey, Little Songbird. It's just such a pushy, patient, practical, sinister vibe, and pushes the narrative forward at the same time.

Lyrically:
The extended bird metaphor is really fun. Especially with all these phrases that are flipped from their typical positive connotation.
"fly south for the winter" [south = the underworld]
"I could use a canary" [He wants a songbird for music, but this line comes right after a reference to "down in the mine"]

Structurally:
Whenever Hades comes back in after Eurydice's part, he overlaps on her last word, which adds to the pushy feel to the song. (Eurydice doesn't start singing till he's fully finished.) Also, Hades' part has this lovely AABBC structure. The extra C line on each stanza makes it feel like he's taking his time.



Happy New AXRP!


Yet another in the Alignment Workshop series.

AI researchers often complain about the poor coverage of their work in the news media. But why is this happening, and how can it be fixed? In this episode, I speak with Shakeel Hashim about the resource constraints facing AI journalism, the disconnect between journalists' and AI researchers' views on transformative AI, and efforts to improve the state of AI journalism, such as Tarbell and Shakeel's newsletter, Transformer.

Transcript
Video



Someone linked me to the article Against SQL recently and it resonates with me a lot. I have a temptation to write a new SQL-like relational query language that tries to fix as many of these problems as I can, but this seems unreasonably ambitious for someone whose background is not databases (and who already has like 3 personal projects ongoing...)

(To be clear, I think unreasonable ambition is sometimes commendable. But I want projects that I'll actually finish.)



Idle thought: I wonder if we'll start seeing "training@home" training runs for open-source LLMs. Anyone care to run some numbers or sanity checks on whether this is possible in principle?

The folding@home project has been hugely successful, reaching at least exaFLOPS of compute.

"Training@home" would have to efficiently do partial gradient updates on extremely heterogeneous hardware with widely varying network properties; I'm not sure if this has any chance of producing base models competitive with e.g. Llama. In terms of ops alone, a 1 exaFLOPS network would have taken 10^7 seconds = ~half a year to train Llama 70b, and I imagine the costs of distributing jobs to such a network and coordinating on weight updates would make this much more expensive. So, probably not going to be competitive?

in reply to Ben Weinstein-Raun

Just this month there was a proof of concept doing distributed training of a 15B parameter model using a new technique to reduce the amount of data that needs to be shared between GPUs, so that it's actually feasible for them to not be co-located. Which is neat! Buuuut they still were using H100s (80GB of memory) as their basic unit of compute. I don't think their technique lets you train models larger than would fit in memory on each GPU, which means any training@home project is going to be limited to single- or low-double-digit billions of parameters. Small models are neat and serve some purposes but we already have a lot of pretty good ones (Llama, Phi, Gemma, NeMo, etc) and it's not clear what the niche would be for a community-trained one. (I mean, porn, I guess, but there's already a lot of NSFW fine-tunes of those models.)
in reply to Kevin Gibbons

I would guess that there will be reasons to at least want an LLM trained on an open corpus, whether it's community-trained or not.

Example reasons include ensuring that the model isn't secretly trying to get you to buy McDonalds, and the possibility that companies start releasing un-fine-tunable models.



Happy new year superstims!


Sparklers are illegal in Alameda county apparently, so I guess I'm off to commit some crimes.

Ben Weinstein-Raun reshared this.



Maya likes to bake. I got her a couple of kid cookbooks for Christmas and now I find myself baking these ridiculous, overly sweet objects.


Man, I miss my huge-tree-antenna. Yesterday I set up a big loop antenna along my house's wall. It transmits fine, but the noise it picks up makes it almost useless.
in reply to Ben Weinstein-Raun

Is this a problem that can be solved with money, like by just going ahead and getting an arborist preemptively?
in reply to Gretta Duleba

It definitely can't be solved with only money; it also requires at least coordinating with the landlord, who is a very reasonable person as far as Berkeley rationalist house landlords seem to go, but overall my guess is that it's not worth bothering him about it
in reply to Ben Weinstein-Raun

I am probably being too problem-solvey right now and I hereby to resolve to stop after this round, but in my experience, arborists are willing to produce documentation of their findings that can later be shown to landlords!

You just sound sad about your antenna and I wanna fix it.



One of the most unique experiences I have currently is when I act like a massage chair for my cat: I'll repeat some specific movement with my hand (usually near her ear) as long as she's pressing into it, and then change it up to other movements she likes.


I've been meaning to start donating blood and/or plasma for a few years now, partly because it's a good thing to do, but also as a way to shed accumulating substances (PFASs have been studied, but also background heavy metals in the case of whole blood donation), but I use topical finasteride for hair loss, which I'd have to stop for a month before donating.

So, say I took a month off from finasteride, and then spent a month donating: whole blood once, and plasma 7 times. If my math is right, I'd have donated / regenerated 1 - 0.92^8 = ~half my blood volume; and ~10% of my body weight. Then maybe back to finasteride for two months, another month of no finasteride, and another donation month?

in reply to Daniel Filan

Maybe precisely in order to incentivize people to donate blood????
in reply to Ben Weinstein-Raun

This doesnt address your musings . . . but I found plasma donation prohibitively unpleasant. It was painful and time consuming. By comparison, whole blood donation is a simple and easy way to help.


I'm finding it really hard to make #hamradio contacts in Delaware. Weirdly hard, given that the five states with smaller populations than Delaware were all much easier, even though some of them are further from me, and I've had no trouble making contacts in its neighboring states.

A few days ago I decided to try to be more strategic about contacting every US state since I was really close, and I've now spent probably twice the time trying to contact Delaware, as trying to contact all four of the other stragglers combined.

in reply to Ben Weinstein-Raun

I just did the math, and it seems like Delaware is the state with the #2 lowest non-urban population. Only Rhode Island should be more difficult
in reply to Ben Weinstein-Raun

Ok, now I looked at the ARRL license counts by state. Going by General+Extra only, modified for non-urban population percentage, Delaware comes out as the worst state.


I made something very silly on a whim and put it on YouTube; it will probably not make sense to you unless you know the song Die Young by Kesha youtu.be/O51SdESXu_A


Lately I’ve been enjoying listening to the album Inside by Mother Mother, which is very much about pandemic isolation. Makes me think: wow I sure do love a concept album!

Common features in concept albums that I really enjoy:

- Explorations of the same ideas from different angles.
- Connections between songs — a song about infatuation hits different after you hear it referenced later in a heartbreak song.
- figuring out the gestalt ideas and the way they’ve changed in the artist’s head over time.
- Taking the time to explore the little details and nuances that fit between the radio singles ab peak experiences.
- Intros, outros, interludes. Having a structural dynamics like this makes listening to the whole thing a satisfying longform experience.

Happy to hear any recommendations for other compelling concept albums, or other music that hits the above features. (I mostly listen to indie rock, folk, pop, psychedelic, etc, but happy to try new things!)

in reply to Sam FM

Not sure if this quite counts but I think the original 2010 Hadestown album is pretty great.
in reply to Sam FM

ooh, yes thanks for the reminder I've been meaning to listen to this!


Today I was inspired to ask ChatGPT for help with my health issues for the first time since o1 was released. It suggested that I might have Cushing's Syndrome, which actually makes a lot of sense. I don't think any doctors ever suggested this directly, but I do have a recollection of a doctor asking me if I was extremely thirsty or urinating a lot (I wasn't), which might have been a question for a relevant differential.

So hopefully tomorrow I'm going to wake up and go get a cortisol test.

in reply to Ben Weinstein-Raun

Hm, cortisol levels are on the high end of normal. I wonder if I did have cushing's syndrome but am now managing it using ashwagandha and antidepressants.


The worst thing about studying Latin is that it's invaded my mind to the degree that I now almost feel like having five cases is reasonable.
in reply to Daniel Filan

"why not more? Why not make a dedicated instrumental case, or a locative that isn't a sewn-together monstrosity comprised by other cases?" - the sounds of a mind deranged by synthetic languages
in reply to Daniel Filan

Russian has six! Also the rules for declension in Russian depend on, among other things, whether something is animate. Are dolls animate? (Yes.) Are corpses? (Depends which word you're using.) Are bacteria? (Yes if you're a biologist, probably not otherwise.)


don't like this



I suspect that "epistemic and instrumental rationality" is better branded and lived as "nobility in thought and deed". But maybe I just have an unusual set of associations with the word "noble"? It's certainly more goal-laden than the word "rational" typically is.
in reply to Daniel Filan

The thing I mean is less altruistic than what David Chapman describes on this page but shares the feature of being valuable and possible.


So here's a dumb question about Jason Gross-style work on compact proofs that I don't want to ask totally publicly - what's the point? I see the value in making the case for interp as being for stuff like compact proofs. But I feel like we know that we aren't going to be able to find literal proofs of relevant safety properties of GPT-4, and we don't even know what those properties should be. So relevant next steps should look like "figure out heuristic arguments" and "figure out WTF AI safety even is" right? So why do more work getting compact proofs of various model properties?
in reply to Daniel Filan

I don't think it's obvious that we can't get proofs of any relevant safety properties. Like, yeah we're not going to get proofs of anything that references human preferences or whatever, but there might be relevant limited subquestions, e.g. about information capacity or something?
in reply to Ben Weinstein-Raun

I guess I just mean that it's really hard to prove anything about big NN behaviour - my understanding is if you try really hard you can do interval propagation in a smart way but that's about it.
This entry was edited (6 months ago)


A question bopping around my mind: are there things like making AXRP or being a MATS RM that I could do instead of those things that would be more valuable? Possible answers:
- just do research that matters
- project manager at a place that does research that matters
- be more directly a competitor to Zvi
- team up with Lawrence Chan and write stuff about various alignment schemes

I think a bottleneck I feel is being unsure about what things are valuable in the info environment, where I think I'm best placed to do stuff.






So like.... what's so good about trains? Why would someone think they are so much cooler than cars / trucks / aeroplanes?
in reply to Daniel Filan

  • Bigger / heavier
  • Stronger / move more stuff
  • Make way better sounds
in reply to Daniel Filan

The infrastructure is somehow really appealing (rails, railroad switches, signals). And there's something great about the way they glide along the track.


don't like this

in reply to kip

(I think my mind is actually like 80-90% back to normal now that it's been almost a week! so, on the faster end of my estimate)
in reply to kip

Ok random update on this. I now have a suspicion that I had physical trauma-effects that were delayed (and longer-lasting) compared to the psychological effects

The noticeable psychological impacts started after a day or so, and lasted maybe 4-5 days?

And I didn't have post-exertional malaise (PEM) right after the incident. But I started getting PEM really easily from other stuff

Here are charts from one of my health trackers. The incident was on the 17th. The top chart is my physical exertion per day (measured with HR data). (Ignore the final entry -- it's only so low because the day just started.) The bottom chart is my morning HRV readings. As you can see, they trended lower for a while after the incident.

Perhaps this decline will persist, but in the last few days, I started getting the feeling that I'm returning to a somewhat-less-severe baseline



Thing I just learned: the author of Paul: a Very Short Introduction, one of my favourite entries in the Very Short Introduction series and one I frequently recommend, is written by E. P. Sanders - one of the most prominent 20th century scholars on the apostle Paul and his thought. Self-recommending!
in reply to Daniel Filan

It really says something about where I'm at today, that it took multiple seconds before I realized you weren't talking about Paul Christiano.
in reply to Ben Weinstein-Raun

RIP I included 'apostle' or something in an earlier draft of this explicitly to counteract this, but randomly left it out of the final version. Fixing it.

in reply to Daniel Filan

In general on one hand I'm like "I'm so grateful English has so much grammar and vocabulary to make it so expressive" but when I see Latin I'm like "Japanese copes with just having past past and non-past plus some participles, why can't you" (not even getting to the whole thing of having different genders and different declensions for nouns and adjectives).
in reply to Daniel Filan

Ironically "coep-" is now the perfect stem I am perhaps least likely to forget.


Solstice notes


  • I like that the celebration took place on (or adjacent to) the actual solstice
  • I broadly thought this year's was worse than last year's, altho it had its charms
  • I liked "Humankind as a sailor" - tricky to pick up but rewarding once you did
  • Just because a song takes place in Australia, I don't think it thereby glorifies the negative aspects of colonialism.
  • The darkness speech was touching this year
  • I feel like a lot of the time the speaker would say something I straightforwardly agreed with in the way I would say it and everyone would laugh.
  • It was funny when Ozy said her favourite website was Our World in Data and Scott sang the praises of Dustin Moskowitz while I was sitting next to Oli
  • I think "the world is awful" is wrong, and not established by there being awful things in the world.
in reply to Daniel Filan

Also 'Humankind as a Sailor' is now on my non-core solstice music playlist and so popped up while I was on the rowing machine - total disaster, induced complete muscle confusion.