> Most programmers are bad at detecting UB and memory ownership and lifetime errors.
And this is why we have languages and tooling that takes care of it.
There's only a handful of people who can one-shot perfect code in a language that doesn't guard against memory ownership or lifetime errors every time.
But even the crappiest programmer has to actually work against the tooling in a language like Rust to ownership issues. Add linters, formatters and unit tests on top of that and it becomes nigh-impossible.
Now put an LLM in the same position, it's also unable to create shitty code when the tooling prevents it from doing so.
But how do you know it's cut to spec if you don't measure it?
Maybe someone bumped the fence aw while you were on a break, or the vibration of it caused the jig to get a bit out of alignment.
The basic point is that whether a human or some kind of automated process, probabilistic or not, is producing something you still need to check the result. And for code specifically, we've had deterministic ways of doing that for 20 years or so.
> And for code specifically, we've had deterministic ways of doing that for 20 years or so.
And those ways all suck!
It's extremely difficult to verify your way to high quality code. At lower amounts of verification it's not good enough. At higher amounts the verification takes so much longer than writing the code that you'll probably get better results cutting off part of the verification time and using it to write the code you're now an expert on.
I guess that the point being made by GP is that most software are a high-dimensional model of a solution to some problem. With traditional coding, you gradually verify it while writing the code, going from simple to complex without loosing the plot. That's what Naur calls "The theory of programming", someone new to a project may take months until they internalize that knowledge (if they ever do).
Most LLM practices throw you in the role of that newbie. Verifying the solution in a short time is impossible. Because the human mind is not capable to grapple with that many factors at once. And if you want to do an in depth review, you will be basically doing traditional coding, but without typing and a lot of consternation when divergences arise.
> And for code specifically, we've had deterministic ways of doing that for 20 years or so.
And none of them are complete. Because all of them are based on hypotheses taken as axioms. Computation theory is very permissive and hardware is noisy and prone to interference.
With normal artisanal coding you take your time getting from A to B and you might find out alternate routes while you slowly make your way to the destination. There's also a clear cost in backtracking and trying an alternate route - you already wrote the "wrong" code and now it's useless. But you also gained more knowledge and maybe in a future trip from A to C or C to D you know that a side route like that is a bad idea.
Also because it's you, a human with experience, you know not to walk down ravines or hit walls at full speed.
With LLMs there's very little cost in backtracking. You're pretty much sending robots from A to B and checking if any of them make it every now and then.
The robots will jump down ravines and take useless side routes because they lac the lived in experience "common sense" of a human.
BUT what makes the route easier for both are linters, tests and other syntactic checks. If you manage to do a full-on Elmo style tunnel from A to B, it's impossible to miss no matter what kind of single-digit IQ bot you send down the tube at breakneck speed. Or just adding a few "don't walk down here, stay on the road" signs on the way,
Coincidentally the same process also makes the same route easier for inexperienced humans.
tl;dr If you have good specs and tests and force the LLM to never stop until the result matches both, you'll get a lot better results. And even if you don't use an AI, the very same tooling will make it easier for humans to create good quality code.
That would be great if you were a research lab with unlimited funding. But most business needs to grapple with real user data. Data they've been hired to process or to provide an easier way to process. Trying stuff until something sticks is not a real solution.
Having tests and specs is no guarantee that something will works. The only truth is the code. One analogy that I always take is the linear equation y = ax + b. You cannot write tests that fully proves that this equation is implemented without replicating the formula in the tests. Instead you check for a finite set of tuples (x, y). Those will helps if you chose the wrong values of a or switch to the negative of b, but someone that knows the tests can come up with a switch case that returns the correct y for the x in the tests and garbage otherwise. That is why puzzle like leetcode don't show you the tests.
Of course tests can't be perfect, but even a flimsy guardrail and a warning sign before a ravine is better than nothing.
The optimal solution would be to encase the whole thing in blast-proof transparent polymer, but nobody has the money to do that :)
Trying stuff until something sticks was not a solution when a human had to do the trying and every line of code cost money.
Now you can launch 20 agents to do slightly different things to see if something sticks - and still do the manual work yourself for the 21st path. The cost for those extra 20 attempts is next to nothing compared to the price of an actual programmer.
And this is why we have languages and tooling that takes care of it.
There's only a handful of people who can one-shot perfect code in a language that doesn't guard against memory ownership or lifetime errors every time.
But even the crappiest programmer has to actually work against the tooling in a language like Rust to ownership issues. Add linters, formatters and unit tests on top of that and it becomes nigh-impossible.
Now put an LLM in the same position, it's also unable to create shitty code when the tooling prevents it from doing so.