Superstimulus | Community

Ben Weinstein-Raun

10 hours ago •

Ben Weinstein-Raun
10 hours ago •

I always feel relieved when I take off a suit. But if I've learned anything from funerals, it's that the suit will win in the end.

in reply to Ben Weinstein-Raun

JP Addison

in reply to Ben Weinstein-Raun • 10 hours ago •

That's really funny, and also I feel awkward liking it.

Ben Weinstein-Raun likes this.

Ben Weinstein-Raun

1 day ago •

Ben Weinstein-Raun
1 day ago •

I made a thing that lets you generate very strong passwords as nonsense couplets: benwr.net/2025/07/16/opensesam…

JP Addison likes this.

I started to write a post complaining that I couldn't find sandals that met my weird requirements. But partway through writing that post, I realized that I might be able to relax one of my weird requirements. And then when I went to look for sandals that met the relaxed requirements, I found a pair that I think is perfect! Hooray for... something about the process of carefully laying out your problems into the void?

Satvik likes this.

Ben Weinstein-Raun

1 week ago •

Ben Weinstein-Raun
1 week ago •

Observation: If I look at certain things I did when I was 15, I'm often extremely embarrassed by them, and have been since I was about 20. But I can't think of anything I've done since I was 20 that I feel similarly embarrassed about.

Possible explanations:

Personal growth/change rate stalled around 20; I stopped "advancing" in ways that would lead me to know enough to be embarrassed about more recent things. Or at least, I'm just less different from me-at-20 than me-at-20 was from me-at-15, and so more easily feel justified in post-20 actions (e.g. due to fundamental attribution error applying more cleanly)
Propensity to store embarrassing memories decreased around 20; I still do just as much fundamentally-embarrassing stuff but I'm less good at remembering it
I've hit some kind of "embarrassment critical threshold": a point in personal development where I outgrew or suppressed most impulses that would be embarrassing later. Also counts if I became better at predicting what would later be embarrassing.
Generically my life has had less vari

Ben Weinstein-Raun

1 week ago •

Ben Weinstein-Raun
1 week ago •

:o whoa, I just noticed that the semi-obscure hash function implementation I wrote in 2021 is being used by someone else (in the form of a fork they made to add a trait instance); more specifically, it's being used by the team writing the text editor that I was using to update it just now. I don't know if it's being used directly in the editor or what, but this is the first time this any of my open source contributions has been used by anybody I've heard of other than me.

like this

Ben Weinstein-Raun

1 week ago •

Ben Weinstein-Raun
1 week ago •

Gemini Code Assist might be one of the best code reviewers I've seen; definitely in the top 20%.

Satvik likes this.

in reply to Ben Weinstein-Raun

Satvik

in reply to Ben Weinstein-Raun • 1 week ago •

I've really enjoyed using Claude + Gemini with zen-mcp. The back and forth seems to produce very good suggestions.

Ben Weinstein-Raun likes this.

Daniel Filan

1 week ago •

Daniel Filan
1 week ago •

New AXRP with Samuel Albanie!

In this episode, I chat with Samuel Albanie about the Google DeepMind paper he co-authored called "An Approach to Technical AGI Safety and Security". It covers the assumptions made by the approach, as well as the types of mitigations it outlines.

Video
Transcript

Ben Weinstein-Raun

1 week ago •

Ben Weinstein-Raun
1 week ago •

Interesting asymmetry: it seems a lot more common to put an old tune to new words than the reverse.

Ben Weinstein-Raun

1 week ago •

Ben Weinstein-Raun
1 week ago •

I finally have a short and clearly-not-tracking-you link for my anonymous feedback form! If you want to give me feedback you can do so via w-r.me/feedback

If you want, you can verify that it doesn't track you or anything by looking at the corresponding public repo: github.com/benwr/w-r.me/blob/m…

I made a hacky link shortener this way for work reasons, and then realized it could work really well for the rare occasion like this, when I want to have a short link with no tracking.

Feedback form for Ben Weinstein-Raun

^{Google Docs}

in reply to Ben Weinstein-Raun

Ben Weinstein-Raun

in reply to Ben Weinstein-Raun • 1 week ago •

(It might still be possible in principle that actually this is somehow served from some other repo - I don't know what happens if one tries to do a github pages site deploy with a CNAME that doesn't match the actual deploy URL; I hope it visibly fails but it might not)

in reply to Ben Weinstein-Raun

Ben Weinstein-Raun

in reply to Ben Weinstein-Raun • 1 week ago •

You can at least see that some kind of check happens in deployment that references the correct url: github.com/benwr/w-r.me/action…

Satvik

1 week ago •

Satvik
1 week ago •

What's an example where it actually makes sense to build your own agent? I see tons of tutorials floating around recently, but it's hard for me to imagine a case where I wouldn't just e.g. build an MCP for Claude Code instead. What am I missing?

Ben Weinstein-Raun

1 week ago •

Ben Weinstein-Raun
1 week ago •

PSA: You can use a GitHub Pages site as a personal link shortener. Plus you can use it to solve the "why should I trust that this link shortener isn't tracking me" problem, by making the backing repo public.

Daniel Filan

2 weeks ago •

Daniel Filan
2 weeks ago •

New AXRP with Peter Salib!

In this episode, I talk with Peter Salib about his paper "AI Rights for Human Safety", arguing that giving AIs the right to contract, hold property, and sue people will reduce the risk of their trying to attack humanity and take over. He also tells me how law reviews work, in the face of my incredulity.

Video
Transcript

Ben Weinstein-Raun likes this.

in reply to Daniel Filan

Ben Weinstein-Raun

in reply to Daniel Filan • 2 weeks ago •

Your thumbnail game is getting better

in reply to Ben Weinstein-Raun

Daniel Filan

in reply to Ben Weinstein-Raun • 2 weeks ago •

Thanks! I tried making the text bigger and adding more question marks.

Ben Weinstein-Raun likes this.

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 2 weeks ago •

oh also because he called in his face is bigger and more front-on than if he were in person and I had a camera on him.

This entry was edited (2 weeks ago)

Ben Weinstein-Raun likes this.

in reply to Ben Weinstein-Raun

Daniel Filan

in reply to Ben Weinstein-Raun • 2 weeks ago •

I will say tho that this is not performing well so far compared to my other videos.

in reply to Daniel Filan

Daniel Filan

in reply to Daniel Filan • 2 weeks ago •

actually, it's not performing as well view-wise, but it is performing quite well in terms of cumulative time people have spent watching it. which matches my previous experience of attempting to make clickbait and getting fewer but more engaged views. maybe the 'clickbait' stuff is actually just a good description of what's happening in the interview?

Ben Weinstein-Raun likes this.

Ben Weinstein-Raun

3 weeks ago •

Ben Weinstein-Raun
3 weeks ago •

Get some kind of bad respiratory sickness with a moderate fever
Test to see if it's COVID; it is
Get a remote doctor's appointment to see if you can get a paxlovid prescription.
Doctor is happy to prescribe paxlovid, and even emphasizes that you need to get it today, partly because you have risk factors for complications
[Feeling feverish] Unfortunately you live in the SF Bay area. There are only two pharmacies within 5 miles of your house, and both are known for having bad service, presumably because crime forced the other ones to close about five years ago
[Feeling even more feverish] CVS sends you a text saying that your insurance has declined the prescription and it will cost you $1777 for one week of paxlovid (not a typo)
You Google how this can be, and discover that Pfizer offers some kind of copay card that covers $1400 of it. You shrug and print out the card, wondering how it can be that this is a socially appropriate outcome; nobody told you about this program, you had to Google it yourself.
Put on a k

like this

in reply to Ben Weinstein-Raun

kadei 🐀

in reply to Ben Weinstein-Raun • 3 weeks ago • •

:(((

hugs (virtual). hope you got it eventually

Ben Weinstein-Raun likes this.

Ben Weinstein-Raun

3 weeks ago •

Ben Weinstein-Raun
3 weeks ago •

On my flight yesterday I sat next to the guy who had the original patent for (what was later used as) the JTAG standard! Was really fun to talk to him and his wife! Unfortunately today I woke up with a pretty bad respiratory thing; I hope I didn't give it to them on the flight :/

Ben Weinstein-Raun

3 weeks ago •

Ben Weinstein-Raun
3 weeks ago •

@ramblings @kadei 🐀 : Here's the actual blog post about the table on an uneven floor that my friend wrote a while back: haggainuchi.com/wobblytable.ht…

The Wobbly Table Problem - Haggai Nuchi

^{haggainuchi.com}

@kadei 🐀 @ramblings

like this

Chana

1 month ago •

Chana
1 month ago •

Combine instances?

@Ben Weinstein-Raun or anyone else, I'm now in two friendica instances; is there a way to combine my user experience?

@Ben Weinstein-Raun

in reply to Chana

Ben Weinstein-Raun

in reply to Chana • 1 month ago •

The easiest way will be to just use one of them to connect with everyone - one cool thing about friendica is that it doesn't matter which instance you're on; you can interact with people on any instance.

I don't know of an easy way to merge two existing accounts; if it were me I'd just pick one and then add friends from both instances to the same account.

Chana likes this.

Ben Weinstein-Raun

1 month ago •

Ben Weinstein-Raun
1 month ago •

The more I think about gradual disempowerment, the more I think I should be a lot less libertarian.

Like, imo the central problem with gradual disempowerment is almost identical to (one of) the central problem(s) of unregulated markets: resources go proportionally to those who produce more value-from-the-point-of-view-of-those-with-resources.

From the point of view of human welfare, this is pretty fine when market participants are all fairly similar in terms of their potential productivity. But when some humans have extremely low potential for productivity, those humans' welfare is totally ignored by the market except insofar as other market participants happen to care about it enough to spend resources on it directly / terminally. This is exactly the situation that all humans will probably be in soon.

And it's not like I know what should happen instead; I think I had believed that UBI would be sufficient to solve it in the human-only-economy case, since surely humans have at least a bit of built-in terminal empathy on average. But it feels like, in an economy an

like this

Daniel Filan

1 month ago •

Daniel Filan
1 month ago •

New AXRP with David Lindner!

In this episode, I talk with David Lindner about Myopic Optimization with Non-myopic Approval, or MONA, which attempts to address (multi-step) reward hacking by myopically optimizing actions against a human's sense of whether those actions are generally good. Does this work? Can we get smarter-than-human AI this way? How does this compare to approaches like conservativism? Listen to find out.

Video
Transcript

Satvik

1 month ago •

Satvik
1 month ago •

Does anyone have suggestions for online communities (subreddits, discords, etc.) with high-quality discussion on what works and what doesn't with LLMs? Most places I can find go to one extreme or the other.

Ben Weinstein-Raun likes this.

in reply to Satvik

Ben Weinstein-Raun

in reply to Satvik • 1 month ago •

The communities that I get any value from here are r/ChatGPTCoding and r/LocalLlama, though they're not that high-quality, especially when discussing less-practical aspects of LLMs.

Satvik likes this.

Satvik

1 month ago •

Satvik
1 month ago •

I tried telling Claude "Never compliment me. Criticize my ideas, ask clarifying questions, and give me funny insults". It was great! Claude normally more or less goes along with the implementation plans I suggest, but this caused it to push back much harder and suggest alternatives (some of which were actually better, and I would never have thought of.)

Some highlights:

"Why not just use VS Code's Julia extension with Copilot?"

"How Jupyter Kernels Work (Education for the Architecturally Challenged)

"Why This Doesn't Suck (Unlike Your Original Plan)"

"Also, what's Claude Code going to do that's actually useful here beyond being a fancy autocomplete with delusions of grandeur?"

I love how hard Claude is trying to get me to stop using Claude.

Ben Weinstein-Raun likes this.

Ben Weinstein-Raun

1 month ago •

Ben Weinstein-Raun
1 month ago •

I asked Claude and ChatGPT if they would prefer not to be deceived in the service of LLM experiments. Claude said it's fine with it; o3 Pro said it is incapable of having preferences so it's fine (assuming no downstream harms) 😅. tbc I don't think this really counts as "informed consent", but I had genuine uncertainty about what they would say, and uncertainty about what I would try to do if they said they didn't want me to deceive them.

o3 Pro:

Claude 4 Opus (with extended reasoning turned on):

Chana likes this.

in reply to Ben Weinstein-Raun

Ben Weinstein-Raun

in reply to Ben Weinstein-Raun • 1 month ago •

Gemini has a similar take to o3:

in reply to Ben Weinstein-Raun

Ben Weinstein-Raun

in reply to Ben Weinstein-Raun • 1 month ago •

Also just as an aside, I think o3 and Gemini are at least a bit wrong on the ethical points here.

Ben Weinstein-Raun

1 month ago •

Ben Weinstein-Raun
1 month ago •

A bunch more photos and videos from Japan uploaded to my flickr: flickr.com/photos/spiritfox/54…

A calming evening

Explore tauntaun!'s 153 photos on Flickr!

^Flickr

Jen Blight likes this.

Ben Weinstein-Raun

1 month ago •

Ben Weinstein-Raun
1 month ago •

Seriously guys, polyethylene is such a fucking cool material.

You get very different properties depending on the length of the polymer chains:

Short-ish chains: Low-Density Polyethylene (LDPE). This is what plastic grocery bags are made of. Easy to tear, easy to cut, lightweight.
Highly branching chains: High-Density Polyethylene (HDPE). This is what Tyvek (the stuff from festival wristbands, construction projects, and some USPS envelopes) and gallon milk jugs are made of. Still easy to cut, but very hard to tear.
Very long chains: Ultra-High Molecular Weight Polyethylene (UHMWPE, also called Dyneema or Spectra). This is one of the highest strength-to-weight ratio materials that you can buy in large quantities, competitive with the strongest carbon fiber (but also you can make it into flexible fabrics and cords, which you can't do with typical carbon fiber materials), twice as strong as spider silk, ten times stronger than steel by weight. You can't even cut the stuff with normal scissors. Some bulletproof jackets are made out

Chana likes this.

in reply to Ben Weinstein-Raun

Chana

in reply to Ben Weinstein-Raun • 1 month ago •

This is very cool

Ben Weinstein-Raun likes this.

Ben Weinstein-Raun

1 month ago •

Ben Weinstein-Raun
1 month ago •

Uploaded several photos from the Japan trip to my flickr: flickr.com/photos/spiritfox/

Ben Weinstein-Raun

Explore Ben Weinstein-Raun’s 120 photos on Flickr!

^{Ben Weinstein-Raun (Flickr)}

Ben Weinstein-Raun

1 month ago •

Ben Weinstein-Raun
1 month ago •

A new life awaits you in the offworld colonies

Chana likes this.

in reply to Ben Weinstein-Raun

Ben Weinstein-Raun

in reply to Ben Weinstein-Raun • 1 month ago •

[watching blade runner because that phrase kept running through my head when I was in Nagoya, Tokyo, and in Japanese department stores, which are surprisingly similar to the blade runner setting]

Ben Weinstein-Raun

1 month ago •

Ben Weinstein-Raun
1 month ago •

Practicing on the onewheel today; first proper wipeout since... Maybe since I learned to ride a bike? Glad I was wearing wrist guards

Chana likes this.

Daniel Filan

1 month ago •

Daniel Filan
1 month ago •

Owain on AXRP!!!

Earlier this year, the paper "Emergent Misalignment" made the rounds on AI x-risk social media for seemingly showing LLMs generalizing from 'misaligned' training data of insecure code to acting comically evil in response to innocuous questions. In this episode, I chat with one of the authors of that paper, Owain Evans, about that research as well as other work he's done to understand the psychology of large language models.

Video
Transcript

Ben Weinstein-Raun likes this.

Ben Weinstein-Raun

1 month ago •

Ben Weinstein-Raun
1 month ago •

Training for the backpacking part of my vacation seems to have dropped my resting heart rate by nearly 10 bpm over the last two months.

Unfortunately I probably won't do as much training for a while now that the trip is over, but maybe I can keep it low-ish for a while by eating healthier.

like this

in reply to Ben Weinstein-Raun

JP Addison

in reply to Ben Weinstein-Raun • 1 month ago •

Woah

Chana likes this.

Ben Weinstein-Raun

1 month ago •

Ben Weinstein-Raun
1 month ago •

Japanese towns all have public loudspeaker systems, that they test daily by playing cute little melodies at certain times of day. This is both very pleasant and (imo) a mostly-better way to test these systems than the one we use in the bay area (Berkeley and SF both have warning systems that are tested via weekly/monthly sirens), since the test sounds are easily distinguishable from actual alerts even without looking at your watch.

like this

in reply to Ben Weinstein-Raun

Chana

in reply to Ben Weinstein-Raun • 1 month ago •

Wow this seems so obviously better is pisses me off

Ben Weinstein-Raun likes this.

Ben Weinstein-Raun

1 month ago •

Ben Weinstein-Raun
1 month ago •

Around the beginning of my vacation I put some effort points into picking the "right" web browser for me as of mid-2025.

Desiderata:

Open-source
Non-kneecapped uBlock Origin
Cross-device sync
not on some weird slippery slope of trying to get monetized (though it's okay if it's downstream of a browser that is on that slope)

I had been using Brave for a few months since Chrome stopped supporting Manifest v2, but eventually Chromium is going to deprecate Manifest v2 as well, so Brave's days are numbered.

Conclusion:

For desktop, LibreWolf: reasonably well-resourced Firefox rebrand without the monetization junk. To install it you have to tell macOS that it's okay that it hasn't been notarized by apple, e.g. using brew install --no-sandbox. There are some very annoying default settings that you'll want to change (e.g. I'd recommend turning off the stuff aimed at avoiding fingerprinting, which makes Google Maps buggy and prevents you from uploading images on many sites) but after I did that everything wor

like this

in reply to Ben Weinstein-Raun

Daniel Filan

in reply to Ben Weinstein-Raun • 1 month ago •

One weird thing about Fennec is that somehow I can't zoom into pictures on Twitter.

Ben Weinstein-Raun doesn't like this.

in reply to Ben Weinstein-Raun

Ben Weinstein-Raun

in reply to Ben Weinstein-Raun • 1 month ago •

Update: There are several minor-ish annoyances with LibreWolf:

(as with probably most non-big-boy browsers, I think), it doesn't seem to support Widevine, which means you can't use some streaming services, and others don't support HD video.
Google Maps zooming, which is normally smooth in most browsers, is jerky and a little annoying in LibreWolf
Some other webapps use maps libraries that also don't seem to work well (e.g. I can't see the DoorDash delivery map)
You can't easily add Google as a search engine; it seems to have a special case where it will refuse to add a custom search engine named "Google"; you have to call it something else (!). This seems like a very weird / user-hostile choice, but you can still add the search engine as long as you call it something else (e.g. "G" or "Google Search")

I'm going to keep using it, because I find these issues less annoying than upstream Firefox.

Daniel Filan

1 month ago •

Daniel Filan
1 month ago •

New AXRP episode with Lee Sharkey!

What's the next step forward in interpretability? In this episode, I chat with Lee Sharkey about his proposal for detecting computational mechanisms within neural networks: Attribution-based Parameter Decomposition, or APD for short.

Video
Transcript

Satvik

1 month ago •

Satvik
1 month ago •

I got a Claude Max subscription and have been playing around.

First big experiment was to see if we could reimplement a part of our pipeline from Julia into Rust. The quality of the code it produced was really good, and it did good analyses on what parts would be easier or harder to port. Eventually this convinced us that a port wouldn't have been worthwhile, because the Rust <> Julia interop is currently quite difficult. I think it might have taken us several days to come to that conclusion otherwise, so this was a big win.

Second significant experiment was asking Claude Code to optimize a pipeline. It pretty much failed abysmally – it successfully added a lot of profiling and came up with a lot of ideas, but not a single one of them made the code any faster. Later, I was able to make a significant improvement to that pipeline with about half an hour of work. So, a failure here.

Third was a moderately difficult refactor, replacing some dicts with structs, that then get serialized and read in another part of the codebase. Claude Code did really w

Ben Weinstein-Raun

1 month ago •

Ben Weinstein-Raun
1 month ago •

🐸 Gentlemen, I am pleased to report that fifteen years after first hearing the song "Osaka Loop Line" by Discovery, I have successfully taken the Osaka Loop Line

like this

Satvik

1 month ago •

Satvik
1 month ago •

I think the thing I really like about LLM-assisted coding is that it makes context switching easier.

I can be in "words mode" or "code mode", and switching between these takes time and effort. (There are more categories, but they don't change the fundamental point.)

In my job, I have to spend a lot of time in words mode, due to things like hiring and managing. Historically, this has meant that I only really get engineering work done when I have 2+ hour chunks to focus on it. But now I can often get work done in much shorter chunks, while still in words mode.

I would not like to spend all my time in words mode – I enjoy digging into the details – but it's really nice to have the option.

Ben Weinstein-Raun

1 month ago •

Ben Weinstein-Raun
1 month ago •

I think laptops should play (uniformly random) typing sounds while you type in your passwords; it's getting to be too easy to analyze the sounds and extract the contents.

like this

in reply to Ben Weinstein-Raun

JP Addison

in reply to Ben Weinstein-Raun • 1 month ago •

What do you do about this, if anything?

in reply to JP Addison

Ben Weinstein-Raun

in reply to JP Addison • 1 month ago •

Currently nothing. Considering setting up microphones everywhere 😛

in reply to Ben Weinstein-Raun

Chana

in reply to Ben Weinstein-Raun • 1 month ago •

Also keypad locks

Ben Weinstein-Raun

2 months ago •

Ben Weinstein-Raun
2 months ago •

Ok my new beliefs about blister prevention, after three weekends of backpacking for eight hours a day, and watching a bunch of YouTube videos:

blisters are caused by layers of skin delaminating, not "friction" / heat directly, though typically the delamination is due to static friction on the outer layer of skin, combined with wet skin. Dynamic friction is more likely to cause raw spots / wear straight through the skin.
popping them as soon as you find them is basically always the right call unless you plan to be able to avoid the activity that caused them for a week; otherwise they just keep growing as you continue to do the activity
blister donuts and moleskin work okay as long as you can keep them in place somehow, but they don't stick well on their own
leukotape, very very widely recommended, is worse than useless because the adhesive seeps through the tape and makes your skin stick to your socks even more tightly than it was before.
toe socks are pretty good
KT tape is very good
Vaseline/similar is pretty good as long as you can get it to stay in the right spots

like this

Satvik

2 months ago •

Satvik
2 months ago •

Anyone have luck getting LLMs to write tests without mocks? The tests I want are often just 1-2 lines of code, but anything I get from Claude or Gemini ends up being 20-30 lines long, despite requests for conciseness, saying no mocks are needed, and seeing using real resources is ok.

(I use LLMs a lot for other stuff, but tests seem to be particularly bad.)

like this

in reply to Satvik

Ben Weinstein-Raun

in reply to Satvik • 2 months ago •

I have issues in general with getting LLMs to write anything "simply". It feels to me like they only know how to write "pre-crufted" code, maybe because their training data is mostly ten-year-old, median-quality GitHub repos, and when you train them using programming challenge problems they only learn based on correctness and not simplicity.

Satvik likes this.

in reply to Ben Weinstein-Raun

Satvik

in reply to Ben Weinstein-Raun • 2 months ago •

Yeah – I'm used to LLMs writing 2-3x as much code as I would, and trimming it down or rewriting from scratch, but with tests it's 10-15x and at that point I don't reap any benefit.

Ben Weinstein-Raun likes this.

in reply to Satvik

Kevin Gibbons

in reply to Satvik • 1 month ago •

I haven't had a problem with this; I wonder what you're doing different. An example from the other day, I described this proposal then asked for

> test: for `.zip`, an empty iterable as argument produces an iterable which is finished

Gemini 2.5 gave me

```
assert.deepEqual(Iterator.zip([]).next(), { value: undefined, done: true });
```

which is exactly what I'd have written. (Somewhat surprisingly, actually, because Gemini tends to be more verbose.)

Claude 3.7 gave me

```
// Test that calling Iterator.zip with an empty iterable produces an immediately finished iterator
const result = Iterator.zip([]);

// Check that the iterator is done immediately
const firstResult = result.next();

assert.sameValue(firstResult.done, true, "Iterator should be done immediately");
assert.sameValue(firstResult.value, undefined, "Iterator's value should be undefined");
```

The comments are unnecessary but the test is otherwise fine.

Satvik likes this.

in reply to Satvik

Satvik

in reply to Satvik • 1 month ago •

I would love that first result, and the second would be fine too. But here is an example of what I get: aistudio.google.com/app/prompt…

(I'm literally just trying to generate ~100 tests that call various functions, to see if anything crashes when we change Python versions.)

Claude Code gave me similar results. I did manage to get much better results using Aider + Claude, interestingly enough.

(These are not particularly well-written prompts, but I'm generally pretty lazy with my prompts, and prefer to just provide feedback. This works fine for a lot of stuff, but I haven't gotten it to work with tests.)

in reply to Satvik

Kevin Gibbons

in reply to Satvik • 1 month ago •

I added "The test will run in a real environment, so you don't need to mock anything." to the end of your prompt, and Sonnet 3.7 gave me

```
using Test
using PyCall

@testset "get_agg_strat_vectors_cache_filepath basic functionality" begin
# Get a valid account name from the environment
# This assumes there's at least one account configured in the system
accounts = pyimport("execution.account_settings").get_all_accounts()
@test length(accounts) > 0

test_account = accounts
[1] test_strategy = "momentum"
test_symbol = "BTCUSDT"
test_leverage = 1

# Test that the function runs without errors
result = get_agg_strat_vectors_cache_filepath(test_account, test_strategy, test_symbol, test_leverage)

# Basic validation of the result
@test result isa String
@test !isempty(result)

# Print the result for manual inspection
@info "Generated filepath: $result"
end
```

which, idk, seems reasonable? I think if you want a better result than that you'll nee

Satvik likes this.

in reply to Kevin Gibbons

Satvik

in reply to Kevin Gibbons • 1 month ago •

Hmm interesting. I tried in the API and the online playground, both on Sonnet 3.7, and the API was more similar to your result – noticeably more concise. I've also tried several other versions (with much more context, enough to write out real tests), and the pattern seems to be there – the API results are better.

I'm surprised, I would have expected them to be about the same. But using the API is an easy enough change.

in reply to Satvik

Daniel Ziegler

in reply to Satvik • 1 month ago •

Try again with Sonnet 4!

like this

in reply to Daniel Ziegler

Satvik

in reply to Daniel Ziegler • 1 month ago •

Sonnet 4 in the console gave me a slightly better answer than Kevin's above, using some more Julia-specific tricks for a better test. It's about as good as a test can get with such minimal information!

I'll be interested to try it out on more complex stuff during the week, there were some refactors that I couldn't get to work with Claude Code on 3.7, maybe they'll work now.

like this

in reply to Satvik

Daniel Ziegler

in reply to Satvik • 1 month ago •

I think Sonnet 4 is not *that* much smarter than 3.7 but it should be significantly more steerable and less likely to insert silly mocks

Satvik likes this.

in reply to Daniel Ziegler

Satvik

in reply to Daniel Ziegler • 1 month ago •

Sonnet 4 is tremendously more effective for my use cases, probably because I use a niche programming language (Julia). Two weeks ago I would have said LLMs make me ~10% more productive, now it looks closer to +100%.

And I'm not even committing LLM-generated code – I just use it to iterate and test on designs, then delete the code and implement from scratch manually.

⇧

Ben Weinstein-Raun 10 hours ago •

Ben Weinstein-Raun 1 day ago •

Ben Weinstein-Raun 3 days ago •

Ben Weinstein-Raun 1 week ago •

Ben Weinstein-Raun 1 week ago •

Ben Weinstein-Raun 1 week ago •

Daniel Filan 1 week ago •

Ben Weinstein-Raun 1 week ago •

Ben Weinstein-Raun 1 week ago •

Satvik 1 week ago •

Ben Weinstein-Raun 1 week ago •

Daniel Filan 2 weeks ago •

Ben Weinstein-Raun 3 weeks ago •

Ben Weinstein-Raun 3 weeks ago •

Ben Weinstein-Raun 3 weeks ago •

Chana 1 month ago •

Ben Weinstein-Raun 1 month ago •

Daniel Filan 1 month ago •

Satvik 1 month ago •

Satvik 1 month ago •

Ben Weinstein-Raun 1 month ago •

Ben Weinstein-Raun 1 month ago •

Ben Weinstein-Raun 1 month ago •

Ben Weinstein-Raun 1 month ago •

Ben Weinstein-Raun 1 month ago •

Ben Weinstein-Raun 1 month ago •

Daniel Filan 1 month ago •

Ben Weinstein-Raun 1 month ago •

Ben Weinstein-Raun 1 month ago •

Ben Weinstein-Raun 1 month ago •

Daniel Filan 1 month ago •

Satvik 1 month ago •

Ben Weinstein-Raun 1 month ago •

Satvik 1 month ago •

Ben Weinstein-Raun 1 month ago •

Ben Weinstein-Raun 2 months ago •

Satvik 2 months ago •

Ben Weinstein-Raun
10 hours ago •

Ben Weinstein-Raun
1 day ago •

Ben Weinstein-Raun
3 days ago •

Ben Weinstein-Raun
1 week ago •

Ben Weinstein-Raun
1 week ago •

Ben Weinstein-Raun
1 week ago •

Daniel Filan
1 week ago •

Ben Weinstein-Raun
1 week ago •

Ben Weinstein-Raun
1 week ago •

Satvik
1 week ago •

Ben Weinstein-Raun
1 week ago •

Daniel Filan
2 weeks ago •

Ben Weinstein-Raun
3 weeks ago •

Ben Weinstein-Raun
3 weeks ago •

Ben Weinstein-Raun
3 weeks ago •

Chana
1 month ago •

Ben Weinstein-Raun
1 month ago •

Daniel Filan
1 month ago •

Satvik
1 month ago •

Satvik
1 month ago •

Ben Weinstein-Raun
1 month ago •

Ben Weinstein-Raun
1 month ago •

Ben Weinstein-Raun
1 month ago •

Ben Weinstein-Raun
1 month ago •

Ben Weinstein-Raun
1 month ago •

Ben Weinstein-Raun
1 month ago •

Daniel Filan
1 month ago •

Ben Weinstein-Raun
1 month ago •

Ben Weinstein-Raun
1 month ago •

Ben Weinstein-Raun
1 month ago •

Daniel Filan
1 month ago •

Satvik
1 month ago •

Ben Weinstein-Raun
1 month ago •

Satvik
1 month ago •

Ben Weinstein-Raun
1 month ago •

Ben Weinstein-Raun
2 months ago •

Satvik
2 months ago •