Ben Weinstein-Raun likes this.
I tried telling Claude "Never compliment me. Criticize my ideas, ask clarifying questions, and give me funny insults". It was great! Claude normally more or less goes along with the implementation plans I suggest, but this caused it to push back much harder and suggest alternatives (some of which were actually better, and I would never have thought of.)
Some highlights:
"Why not just use VS Code's Julia extension with Copilot?"
"How Jupyter Kernels Work (Education for the Architecturally Challenged)
"Why This Doesn't Suck (Unlike Your Original Plan)"
"Also, what's Claude Code going to do that's actually useful here beyond being a fancy autocomplete with delusions of grandeur?"
I love how hard Claude is trying to get me to stop using Claude.
Ben Weinstein-Raun likes this.
I think the thing I really like about LLM-assisted coding is that it makes context switching easier.
I can be in "words mode" or "code mode", and switching between these takes time and effort. (There are more categories, but they don't change the fundamental point.)
In my job, I have to spend a lot of time in words mode, due to things like hiring and managing. Historically, this has meant that I only really get engineering work done when I have 2+ hour chunks to focus on it. But now I can often get work done in much shorter chunks, while still in words mode.
I would not like to spend all my time in words mode – I enjoy digging into the details – but it's really nice to have the option.
Anyone have luck getting LLMs to write tests without mocks? The tests I want are often just 1-2 lines of code, but anything I get from Claude or Gemini ends up being 20-30 lines long, despite requests for conciseness, saying no mocks are needed, and seeing using real resources is ok.
(I use LLMs a lot for other stuff, but tests seem to be particularly bad.)
like this
Satvik likes this.
Sonnet 4 is tremendously more effective for my use cases, probably because I use a niche programming language (Julia). Two weeks ago I would have said LLMs make me ~10% more productive, now it looks closer to +100%.
And I'm not even committing LLM-generated code – I just use it to iterate and test on designs, then delete the code and implement from scratch manually.
Run-time type checking is way more useful than I expected. I've been using it in Julia for 4 years now, and I expected it to provide ~25% of the value of the value of static type checking, but it's actually been closer to 90%.
I guess it's because when I'm developing, I'm constantly running code anyway, either through a notebook or tests. And the change -> run loop in Julia is not noticeably slower than the change -> compile loop in Scala.
The big exception is when I have code that can only reasonably be run on a remote machine and takes 5+ minutes to set up/execute. Then I'd really like more static analysis.
Ben Weinstein-Raun likes this.
Cracking Eggs
The best way to crack eggs is the highlander method: beat two eggs against each other. This overly easy method preserves rarely makes a mess, and is tolerant to a lot of different levels of force.
But don't just look on the sunny side: the highlander method has a major flaw. What do you do with the last egg? If you haven't hatched a plan, you may scramble to one of the inferior methods: counter or bowl.
The counter method is the safe option: it consistently produces a small mess, even if your strike is eggsceptional. But if you're ready to leave your shell, the bowl is for gamblers and dreamers: it can produce a mess-free egg if you aim things perfectly, but you'll end up with shell everywhere unless you crack it eggsactly right.
Ben Weinstein-Raun likes this.
One of the main questions I ask in interviews is basically "we have a data pipeline with goal X and constraints A, B, and C. How would you design it?" Depending on how they do, we'll discuss various tradeoffs, other possible goals/constraints, and so on.
This is based on a real system I designed and have been maintaining for ~5 years, and is also very similar to other systems I've run at previous jobs.
About half the candidates complain that it's not a realistic question.
Ben Weinstein-Raun likes this.
I've asked for more specific feedback, and the complaints often come down to "nothing I've done has been like this" and "most of development is web development." That might be true, but we don't have a website/web app, and we're pretty specific about the work involved in both the job description and the phone interview.
(We have had other feedback that's helpful)
Generally, everyone who's done well on this question and joined has been a strong hire, though we've also hired some people who didn't do well that specific question. So I'm pretty sure it's a good question. I'm just a little amused/dismayed at how many people seem to think "realistic" means "web development."
Ben Weinstein-Raun likes this.
Satvik likes this.
Satvik likes this.
Ben Weinstein-Raun
in reply to Satvik • • •Satvik likes this.