I think the thing I really like about LLM-assisted coding is that it makes context switching easier.
I can be in "words mode" or "code mode", and switching between these takes time and effort. (There are more categories, but they don't change the fundamental point.)
In my job, I have to spend a lot of time in words mode, due to things like hiring and managing. Historically, this has meant that I only really get engineering work done when I have 2+ hour chunks to focus on it. But now I can often get work done in much shorter chunks, while still in words mode.
I would not like to spend all my time in words mode – I enjoy digging into the details – but it's really nice to have the option.
Anyone have luck getting LLMs to write tests without mocks? The tests I want are often just 1-2 lines of code, but anything I get from Claude or Gemini ends up being 20-30 lines long, despite requests for conciseness, saying no mocks are needed, and seeing using real resources is ok.
(I use LLMs a lot for other stuff, but tests seem to be particularly bad.)
like this
Run-time type checking is way more useful than I expected. I've been using it in Julia for 4 years now, and I expected it to provide ~25% of the value of the value of static type checking, but it's actually been closer to 90%.
I guess it's because when I'm developing, I'm constantly running code anyway, either through a notebook or tests. And the change -> run loop in Julia is not noticeably slower than the change -> compile loop in Scala.
The big exception is when I have code that can only reasonably be run on a remote machine and takes 5+ minutes to set up/execute. Then I'd really like more static analysis.
Ben Weinstein-Raun likes this.
Cracking Eggs
The best way to crack eggs is the highlander method: beat two eggs against each other. This overly easy method preserves rarely makes a mess, and is tolerant to a lot of different levels of force.
But don't just look on the sunny side: the highlander method has a major flaw. What do you do with the last egg? If you haven't hatched a plan, you may scramble to one of the inferior methods: counter or bowl.
The counter method is the safe option: it consistently produces a small mess, even if your strike is eggsceptional. But if you're ready to leave your shell, the bowl is for gamblers and dreamers: it can produce a mess-free egg if you aim things perfectly, but you'll end up with shell everywhere unless you crack it eggsactly right.
Ben Weinstein-Raun likes this.
One of the main questions I ask in interviews is basically "we have a data pipeline with goal X and constraints A, B, and C. How would you design it?" Depending on how they do, we'll discuss various tradeoffs, other possible goals/constraints, and so on.
This is based on a real system I designed and have been maintaining for ~5 years, and is also very similar to other systems I've run at previous jobs.
About half the candidates complain that it's not a realistic question.
Ben Weinstein-Raun likes this.
I've asked for more specific feedback, and the complaints often come down to "nothing I've done has been like this" and "most of development is web development." That might be true, but we don't have a website/web app, and we're pretty specific about the work involved in both the job description and the phone interview.
(We have had other feedback that's helpful)
Generally, everyone who's done well on this question and joined has been a strong hire, though we've also hired some people who didn't do well that specific question. So I'm pretty sure it's a good question. I'm just a little amused/dismayed at how many people seem to think "realistic" means "web development."
Ben Weinstein-Raun likes this.
Satvik likes this.
Satvik likes this.
Ben Weinstein-Raun
in reply to Satvik • • •Satvik likes this.
Satvik
in reply to Ben Weinstein-Raun • •Ben Weinstein-Raun likes this.
Kevin Gibbons
in reply to Satvik • • •I haven't had a problem with this; I wonder what you're doing different. An example from the other day, I described this proposal then asked for
> test: for `.zip`, an empty iterable as argument produces an iterable which is finished
Gemini 2.5 gave me
```
assert.deepEqual(Iterator.zip([]).next(), { value: undefined, done: true });
```
which is exactly what I'd have written. (Somewhat surprisingly, actually, because Gemini tends to be more verbose.)
Claude 3.7 gave me
```
// Test that calling Iterator.zip with an empty iterable produces an immediately finished iterator
const result = Iterator.zip([]);
// Check that the iterator is done immediately
const firstResult = result.next();
assert.sameValue(firstResult.done, true, "Iterator should be done immediately");
assert.sameValue(firstResult.value, undefined, "Iterator's value should be undefined");
```
The comments are unnecessary but the test is otherwise fine.
Satvik likes this.
Satvik
in reply to Satvik • •I would love that first result, and the second would be fine too. But here is an example of what I get: aistudio.google.com/app/prompt…
(I'm literally just trying to generate ~100 tests that call various functions, to see if anything crashes when we change Python versions.)
Claude Code gave me similar results. I did manage to get much better results using Aider + Claude, interestingly enough.
(These are not particularly well-written prompts, but I'm generally pretty lazy with my prompts, and prefer to just provide feedback. This works fine for a lot of stuff, but I haven't gotten it to work with tests.)
Kevin Gibbons
in reply to Satvik • • •I added "The test will run in a real environment, so you don't need to mock anything." to the end of your prompt, and Sonnet 3.7 gave me
```
using Test
using PyCall
@testset "get_agg_strat_vectors_cache_filepath basic functionality" begin
# Get a valid account name from the environment
# This assumes there's at least one account configured in the system
accounts = pyimport("execution.account_settings").get_all_accounts()
@test length(accounts) > 0
test_account = accounts
[1] test_strategy = "momentum"
test_symbol = "BTCUSDT"
test_leverage = 1
# Test that the function runs without errors
result = get_agg_strat_vectors_cache_filepath(test_account, test_strategy, test_symbol, test_leverage)
# Basic validation of the result
@test result isa String
@test !isempty(result)
# Print the result for manual inspection
@info "Generated filepath: $result"
end
```
which, idk, seems reasonable? I think if you want a better result than that you'll nee
... show moreI added "The test will run in a real environment, so you don't need to mock anything." to the end of your prompt, and Sonnet 3.7 gave me
```
using Test
using PyCall
@testset "get_agg_strat_vectors_cache_filepath basic functionality" begin
# Get a valid account name from the environment
# This assumes there's at least one account configured in the system
accounts = pyimport("execution.account_settings").get_all_accounts()
@test length(accounts) > 0
test_account = accounts
[1] test_strategy = "momentum"
test_symbol = "BTCUSDT"
test_leverage = 1
# Test that the function runs without errors
result = get_agg_strat_vectors_cache_filepath(test_account, test_strategy, test_symbol, test_leverage)
# Basic validation of the result
@test result isa String
@test !isempty(result)
# Print the result for manual inspection
@info "Generated filepath: $result"
end
```
which, idk, seems reasonable? I think if you want a better result than that you'll need to give it more context. I mean, _I_ couldn't write a test for that function without having more context.
I'm using the API, fwiw; it's possible that the online playgrounds are more verbose by default.
Satvik likes this.
Satvik
in reply to Kevin Gibbons • •Hmm interesting. I tried in the API and the online playground, both on Sonnet 3.7, and the API was more similar to your result – noticeably more concise. I've also tried several other versions (with much more context, enough to write out real tests), and the pattern seems to be there – the API results are better.
I'm surprised, I would have expected them to be about the same. But using the API is an easy enough change.
Daniel Ziegler
in reply to Satvik • • •like this
Satvik and Ben Weinstein-Raun like this.
Satvik
in reply to Daniel Ziegler • •Sonnet 4 in the console gave me a slightly better answer than Kevin's above, using some more Julia-specific tricks for a better test. It's about as good as a test can get with such minimal information!
I'll be interested to try it out on more complex stuff during the week, there were some refactors that I couldn't get to work with Claude Code on 3.7, maybe they'll work now.
Ben Weinstein-Raun likes this.