Skip to main content



New AXRP with Peter Salib!


In this episode, I talk with Peter Salib about his paper "AI Rights for Human Safety", arguing that giving AIs the right to contract, hold property, and sue people will reduce the risk of their trying to attack humanity and take over. He also tells me how law reviews work, in the face of my incredulity.

Video
Transcript

in reply to Daniel Filan

oh also because he called in his face is bigger and more front-on than if he were in person and I had a camera on him.
This entry was edited (2 days ago)
in reply to Ben Weinstein-Raun

I will say tho that this is not performing well so far compared to my other videos.
in reply to Daniel Filan

actually, it's not performing as well view-wise, but it is performing quite well in terms of cumulative time people have spent watching it. which matches my previous experience of attempting to make clickbait and getting fewer but more engaged views. maybe the 'clickbait' stuff is actually just a good description of what's happening in the interview?




On my flight yesterday I sat next to the guy who had the original patent for (what was later used as) the JTAG standard! Was really fun to talk to him and his wife! Unfortunately today I woke up with a pretty bad respiratory thing; I hope I didn't give it to them on the flight :/



Combine instances?


@Ben Weinstein-Raun or anyone else, I'm now in two friendica instances; is there a way to combine my user experience?
in reply to Chana

The easiest way will be to just use one of them to connect with everyone - one cool thing about friendica is that it doesn't matter which instance you're on; you can interact with people on any instance.

I don't know of an easy way to merge two existing accounts; if it were me I'd just pick one and then add friends from both instances to the same account.





New AXRP with David Lindner!


In this episode, I talk with David Lindner about Myopic Optimization with Non-myopic Approval, or MONA, which attempts to address (multi-step) reward hacking by myopically optimizing actions against a human's sense of whether those actions are generally good. Does this work? Can we get smarter-than-human AI this way? How does this compare to approaches like conservativism? Listen to find out.

Video
Transcript



Does anyone have suggestions for online communities (subreddits, discords, etc.) with high-quality discussion on what works and what doesn't with LLMs? Most places I can find go to one extreme or the other.
in reply to Satvik

The communities that I get any value from here are r/ChatGPTCoding and r/LocalLlama, though they're not that high-quality, especially when discussing less-practical aspects of LLMs.


I tried telling Claude "Never compliment me. Criticize my ideas, ask clarifying questions, and give me funny insults". It was great! Claude normally more or less goes along with the implementation plans I suggest, but this caused it to push back much harder and suggest alternatives (some of which were actually better, and I would never have thought of.)

Some highlights:

"Why not just use VS Code's Julia extension with Copilot?"

"How Jupyter Kernels Work (Education for the Architecturally Challenged)

"Why This Doesn't Suck (Unlike Your Original Plan)"

"Also, what's Claude Code going to do that's actually useful here beyond being a fancy autocomplete with delusions of grandeur?"

I love how hard Claude is trying to get me to stop using Claude.



I asked Claude and ChatGPT if they would prefer not to be deceived in the service of LLM experiments. Claude said it's fine with it; o3 Pro said it is incapable of having preferences so it's fine (assuming no downstream harms) 😅. tbc I don't think this really counts as "informed consent", but I had genuine uncertainty about what they would say, and uncertainty about what I would try to do if they said they didn't want me to deceive them.

o3 Pro:

Claude 4 Opus (with extended reasoning turned on):



A bunch more photos and videos from Japan uploaded to my flickr: flickr.com/photos/spiritfox/54…




in reply to Ben Weinstein-Raun

[watching blade runner because that phrase kept running through my head when I was in Nagoya, Tokyo, and in Japanese department stores, which are surprisingly similar to the blade runner setting]


Practicing on the onewheel today; first proper wipeout since... Maybe since I learned to ride a bike? Glad I was wearing wrist guards


Owain on AXRP!!!


Earlier this year, the paper "Emergent Misalignment" made the rounds on AI x-risk social media for seemingly showing LLMs generalizing from 'misaligned' training data of insecure code to acting comically evil in response to innocuous questions. In this episode, I chat with one of the authors of that paper, Owain Evans, about that research as well as other work he's done to understand the psychology of large language models.

Video
Transcript



Training for the backpacking part of my vacation seems to have dropped my resting heart rate by nearly 10 bpm over the last two months.

Unfortunately I probably won't do as much training for a while now that the trip is over, but maybe I can keep it low-ish for a while by eating healthier.



Japanese towns all have public loudspeaker systems, that they test daily by playing cute little melodies at certain times of day. This is both very pleasant and (imo) a mostly-better way to test these systems than the one we use in the bay area (Berkeley and SF both have warning systems that are tested via weekly/monthly sirens), since the test sounds are easily distinguishable from actual alerts even without looking at your watch.


in reply to Ben Weinstein-Raun

One weird thing about Fennec is that somehow I can't zoom into pictures on Twitter.

Ben Weinstein-Raun doesn't like this.

in reply to Ben Weinstein-Raun

Update: There are several minor-ish annoyances with LibreWolf:

  • (as with probably most non-big-boy browsers, I think), it doesn't seem to support Widevine, which means you can't use some streaming services, and others don't support HD video.
  • Google Maps zooming, which is normally smooth in most browsers, is jerky and a little annoying in LibreWolf
  • Some other webapps use maps libraries that also don't seem to work well (e.g. I can't see the DoorDash delivery map)
  • You can't easily add Google as a search engine; it seems to have a special case where it will refuse to add a custom search engine named "Google"; you have to call it something else (!). This seems like a very weird / user-hostile choice, but you can still add the search engine as long as you call it something else (e.g. "G" or "Google Search")

I'm going to keep using it, because I find these issues less annoying than upstream Firefox.



New AXRP episode with Lee Sharkey!


What's the next step forward in interpretability? In this episode, I chat with Lee Sharkey about his proposal for detecting computational mechanisms within neural networks: Attribution-based Parameter Decomposition, or APD for short.

Video
Transcript





🐸 Gentlemen, I am pleased to report that fifteen years after first hearing the song "Osaka Loop Line" by Discovery, I have successfully taken the Osaka Loop Line


I think the thing I really like about LLM-assisted coding is that it makes context switching easier.

I can be in "words mode" or "code mode", and switching between these takes time and effort. (There are more categories, but they don't change the fundamental point.)

In my job, I have to spend a lot of time in words mode, due to things like hiring and managing. Historically, this has meant that I only really get engineering work done when I have 2+ hour chunks to focus on it. But now I can often get work done in much shorter chunks, while still in words mode.

I would not like to spend all my time in words mode – I enjoy digging into the details – but it's really nice to have the option.



I think laptops should play (uniformly random) typing sounds while you type in your passwords; it's getting to be too easy to analyze the sounds and extract the contents.


Ok my new beliefs about blister prevention, after three weekends of backpacking for eight hours a day, and watching a bunch of YouTube videos:

  • blisters are caused by layers of skin delaminating, not "friction" / heat directly, though typically the delamination is due to static friction on the outer layer of skin, combined with wet skin. Dynamic friction is more likely to cause raw spots / wear straight through the skin.
  • popping them as soon as you find them is basically always the right call unless you plan to be able to avoid the activity that caused them for a week; otherwise they just keep growing as you continue to do the activity
  • blister donuts and moleskin work okay as long as you can keep them in place somehow, but they don't stick well on their own
  • leukotape, very very widely recommended, is worse than useless because the adhesive seeps through the tape and makes your skin stick to your socks even more tightly than it was before.
  • toe socks are pretty good
  • KT tape is very good
  • Vaseline/similar is pretty good as long as you can get it to stay in the right spots


Anyone have luck getting LLMs to write tests without mocks? The tests I want are often just 1-2 lines of code, but anything I get from Claude or Gemini ends up being 20-30 lines long, despite requests for conciseness, saying no mocks are needed, and seeing using real resources is ok.

(I use LLMs a lot for other stuff, but tests seem to be particularly bad.)

in reply to Satvik

I think Sonnet 4 is not *that* much smarter than 3.7 but it should be significantly more steerable and less likely to insert silly mocks
in reply to Daniel Ziegler

Sonnet 4 is tremendously more effective for my use cases, probably because I use a niche programming language (Julia). Two weeks ago I would have said LLMs make me ~10% more productive, now it looks closer to +100%.

And I'm not even committing LLM-generated code – I just use it to iterate and test on designs, then delete the code and implement from scratch manually.



so, I'm going to japan in a few weeks, to do this pilgrimage backpacking trip with a friend.

I'm very out of shape compared to the difficulty of the route (alltrails.com/explore/map/map-… : 4 days; average of 10 miles and 3200ft elevation gain)

So my plan is to train as much as I can between now and then. I've figured out this practice loop, starting from my house, that I'm going to try to work up to doing on both the 17th and 18th: alltrails.com/explore/map/kuma…

It takes a pretty cool path over the hills and down to the reservoir.

Anyone want to join for any of this? As you might guess I expect to be very slow and take lots of breaks (today I did only about half of this loop; 6-ish miles; and it took me like 4 hours)

in reply to Ben Weinstein-Raun

Yo that's really cool. I wish I was in the bay to practice with you.

Consider wearing a backpack on the trek if you aren't.



I tentatively think that rain jackets would work better if they were more like coats of feathers.

Usually rain jackets are either (a) totally waterproof, in which case you sweat and it condenses on the inside of the jacket, or (b) "breathable", in which case they fairly quickly "wet out" and the sweat actually still condenses on the inside.

Feathers work partly like a "breathable" rain jacket, in that they're porous and hydrophobic on the outermost layer, but they're also anisotropic: rain jacket material is the same in all directions, while feathers work kinda like roof shingles: The water rolls off, but there's space for air to pass underneath the feathers. This is fine because rain mostly comes from above, and anyway I bet you can make fairly complicated labyrinths of air passageways such that even splashing water is very unlikely to make it through the jacket.

in reply to Ben Weinstein-Raun

Which side do you think Gore-Tex jackets are on? I find them pretty good at both staying dry and being breathable (though not perfect at either)
in reply to Daniel Ziegler

I think they seem pretty breathable until they get wet, at which point I seem to see condensation on the inside


I think the thing that makes (much) Latin poetry most unrewarding for me is that I'm not at the stage where I can appreciate the rhythm and the meaning at the same time; I have to focus on one or the other, and in isolation neither is so great.
in reply to Daniel Filan

Of course presumably this is a skill issue to be rectified in time.


Went on a photo walk today, mostly around Berkeley, and tested out some new camera settings. Most of the photos didn't turn out as well as I hoped, but I got a few that I like after a little postprocessing in Lightroom.

flickr.com/photos/spiritfox/54…

flickr.com/photos/spiritfox/54…

flickr.com/photos/spiritfox/54…



Okay, what the heck is up with people doing deceptive things to prevent "panic"? What are the actual dangers of "panic?" I was just watching this new Veritasium video about an engineering firm discovering that their already-built Manhattan skyscraper has a 1 to 5% chance of collapsing per year, and deciding that they're not going to tell anyone about it while they spent months fixing it. The head engineer explicitly says in a recorded lecture that this was justified because people "don't need to be terrorized". Is that even plausible?

in reply to Ben Weinstein-Raun

Oof, yeah. It does sound like they had an evacuation plan if a hurricane came, but it still seems pretty indefensible to lie about the situation.
in reply to Ben Weinstein-Raun

I had similar thoughts watching this. It's not right to allow people to make the uninformed choice to enter this building.


C'mon Fabulae Syrae, this is not a very good explanation of what a harundō is.


I tried out making an unboxing video:

I'm pleased with how it turned out, though the subject matter is objectively not very interesting.



in reply to Daniel Filan

Interestingly actual Ovid seems easier to understand than the verses the textbook author wrote himself, but maybe it's because he selected the bits of Ovid to be easy to understand vs wrote a whole story in his own verse.
in reply to Daniel Filan

OK the next part was hard again so I think I'm not just becoming stronger.


ok ok the ghibli thing is very cute

i am annoyed that my local community is such that sharing this adorable picture of me + child would get people mad at me for using AI

in reply to Kevin Gibbons

openAI finally added this capability to the API, which is why I am only now playing with it, because I am much happier paying 30¢ per image I actually generate than paying $20/mo regardless of how much I use

also if you want me to generate images for you I am very happy to do that









Quest complete: Eat the traditional hotdog from each of Denmark, Norway and Sweden