Anyone have luck getting LLMs to write tests without mocks? The tests I want are often just 1-2 lines of code, but anything I get from Claude or Gemini ends up being 20-30 lines long, despite requests for conciseness, saying no mocks are needed, and seeing using real resources is ok.
(I use LLMs a lot for other stuff, but tests seem to be particularly bad.)
like this
Ben Weinstein-Raun
in reply to Satvik • •Satvik likes this.
Satvik
in reply to Ben Weinstein-Raun • •Ben Weinstein-Raun likes this.
Kevin Gibbons
in reply to Satvik • •I haven't had a problem with this; I wonder what you're doing different. An example from the other day, I described this proposal then asked for
> test: for `.zip`, an empty iterable as argument produces an iterable which is finished
Gemini 2.5 gave me
```
assert.deepEqual(Iterator.zip([]).next(), { value: undefined, done: true });
```
which is exactly what I'd have written. (Somewhat surprisingly, actually, because Gemini tends to be more verbose.)
Claude 3.7 gave me
```
// Test that calling Iterator.zip with an empty iterable produces an immediately finished iterator
const result = Iterator.zip([]);
// Check that the iterator is done immediately
const firstResult = result.next();
assert.sameValue(firstResult.done, true, "Iterator should be done immediately");
assert.sameValue(firstResult.value, undefined, "Iterator's value should be undefined");
```
The comments are unnecessary but the test is otherwise fine.
Satvik likes this.
Satvik
in reply to Satvik • •I would love that first result, and the second would be fine too. But here is an example of what I get: aistudio.google.com/app/prompt…
(I'm literally just trying to generate ~100 tests that call various functions, to see if anything crashes when we change Python versions.)
Claude Code gave me similar results. I did manage to get much better results using Aider + Claude, interestingly enough.
(These are not particularly well-written prompts, but I'm generally pretty lazy with my prompts, and prefer to just provide feedback. This works fine for a lot of stuff, but I haven't gotten it to work with tests.)
Kevin Gibbons
in reply to Satvik • •I added "The test will run in a real environment, so you don't need to mock anything." to the end of your prompt, and Sonnet 3.7 gave me
```
using Test
using PyCall
@testset "get_agg_strat_vectors_cache_filepath basic functionality" begin
# Get a valid account name from the environment
# This assumes there's at least one account configured in the system
accounts = pyimport("execution.account_settings").get_all_accounts()
@test length(accounts) > 0
test_account = accounts
[1] test_strategy = "momentum"
test_symbol = "BTCUSDT"
test_leverage = 1
# Test that the function runs without errors
result = get_agg_strat_vectors_cache_filepath(test_account, test_strategy, test_symbol, test_leverage)
# Basic validation of the result
@test result isa String
@test !isempty(result)
# Print the result for manual inspection
@info "Generated filepath: $result"
end
```
which, idk, seems reasonable? I think if you want a better result than that you'll nee
... show moreI added "The test will run in a real environment, so you don't need to mock anything." to the end of your prompt, and Sonnet 3.7 gave me
```
using Test
using PyCall
@testset "get_agg_strat_vectors_cache_filepath basic functionality" begin
# Get a valid account name from the environment
# This assumes there's at least one account configured in the system
accounts = pyimport("execution.account_settings").get_all_accounts()
@test length(accounts) > 0
test_account = accounts
[1] test_strategy = "momentum"
test_symbol = "BTCUSDT"
test_leverage = 1
# Test that the function runs without errors
result = get_agg_strat_vectors_cache_filepath(test_account, test_strategy, test_symbol, test_leverage)
# Basic validation of the result
@test result isa String
@test !isempty(result)
# Print the result for manual inspection
@info "Generated filepath: $result"
end
```
which, idk, seems reasonable? I think if you want a better result than that you'll need to give it more context. I mean, _I_ couldn't write a test for that function without having more context.
I'm using the API, fwiw; it's possible that the online playgrounds are more verbose by default.
Satvik likes this.
Satvik
in reply to Kevin Gibbons • •Hmm interesting. I tried in the API and the online playground, both on Sonnet 3.7, and the API was more similar to your result – noticeably more concise. I've also tried several other versions (with much more context, enough to write out real tests), and the pattern seems to be there – the API results are better.
I'm surprised, I would have expected them to be about the same. But using the API is an easy enough change.
Daniel Ziegler
in reply to Satvik • •like this
Satvik and Ben Weinstein-Raun like this.
Satvik
in reply to Daniel Ziegler • •Sonnet 4 in the console gave me a slightly better answer than Kevin's above, using some more Julia-specific tricks for a better test. It's about as good as a test can get with such minimal information!
I'll be interested to try it out on more complex stuff during the week, there were some refactors that I couldn't get to work with Claude Code on 3.7, maybe they'll work now.
Ben Weinstein-Raun likes this.