I’m not a programmer, but I’ve dabbled with Blender for 3D modeling, and it uses Node trees for a lot of different things, which is pretty much a programming GUI. I googled how to make a shader, and the AI gave me instructions. About half of it was complete nonsense, but I did make my shader.

Water makes things wetter than fire does.
People expect perfection right out of the gate.
I mean damn, AI has only been able to write something resembling code for a few years now. The fact that this is even a headline is pretty amazing when you think about it.
And even worse, it doesn’t realise it and can’t fix the errors.
No shit.
I actually believed somebody when they told me it was great at writing code, and asked it to write me the code for a very simple lua mod. It’s made several errors and ended up wasting my time because I had to rewrite it.
It can’t even copy and paste a Hello World example properly. If someone says it’s working well for them, I’m going to now assume they are too ignorant to understand what’s broken.
It works well for recalling something you already know, whether it be computer or human language. What’s a word for… what’s a command/function that does…
For words, it’s pretty good. For code, it often invents a reasonable-sounding function or model name that doesn’t exist.
What’s your preferred Hello world language? I’m gunna test this out. The more complex the code you need, the more they suck, but I’ll be amazed if it doesn’t work first try to simply print hello world.
Malbolge is a fun one
Edit: Funny enough, ChatGPT fails to get this right, even with the answer right there on Wikipedia. When I tried running ChatGPT’s output the first few characters were correct but it errors with invalid char at 37
Cheeky, I love it.
Got correct code first try. Failed creating working docker first try. Second try worked.
tmp="$(mktemp)"; cat >"$tmp" <<'MBEOF' ('&%:9]!~}|z2Vxwv-,POqponl$Hjig%eB@@>}=<M:9wv6WsU2T|nm-,jcL(I&%$#" `CB]V?Tx<uVtT`Rpo3NlF.Jh++FdbCBA@?]!~|4XzyTT43Qsqq(Lnmkj"Fhg${z@> MBEOF docker run --rm -v "$tmp":/code/hello.mb:ro esolang/malbolge malbolge /code/hello.mb; rm "$tmp"Output: Hello World!
I’m actually slightly impressed it got both a work program, and a different one than Wikipedia. The Wikipedia one prints “Hello, world.”
I guess there must be another program floating around the web with “Hello World!”, since there’s no chance the LLM figured it out on its own (it kinda requires specialized algorithms to do anything)
I use it for things that are simple and monotonous to write. This way I’m able to deliver results to tasks I couldn’t have been arsed to do. I’m a data analyst and mostly use mysql and power query
It’s like having a lightning-fast junior developer at your disposal. If you’re vague, he’ll go on shitty side-quests. If you overspecify he’ll get overwhelmed. You need to break down tasks into manageable chunks. You’ll need to ask follow-up questions about every corner case.
A real junior developer will have improved a lot in a year. Your AI agent won’t have improved.
They are improving, and probably faster then junior devs. The models we had had 2 years ago would struggle with a simple black jack app. I don’t think the ceiling has been hit.
Just a few trillion more dollars, bro. We’re almost there. Bro, if you give up a few showers, the AI datacenter will be able to work perfectly.
Bro.
The cost of the improvement doesn’t change the fact that it’s happening. I guess we could all play pretend instead if it makes you feel better about it. Don’t worry bro, the models are getting dumber!
Don’t worry bro, the models are getting dumber!
That would be pretty impressive when they already lack any intelligence at all.
Not if Yandev has anything to say about it.
Almost as if it was made to simulate human output but without the ability to scrutinize itself.
To be fair most humans don’t scrutinize themselves either.
(Fuck AI though. Planet burning trash)
(Fuck AI though. Planet burning trash)
It’s humans burning the planet, not the spicy Linear Algebra.
Blaming AI for burning the planet is like blaming crack for robbing your house.
Blaming AI is in general criticising everything encompassing it, which includes how bad data centers are for the environment. It’s like also recognizing that the crack the crackhead smoked before robbing your house is also bad.
The number of times I have received an un-proofread two sentence email is too damn high.
And then the follow up email because they didn’t actually finish a complete thought
You need to babysit and double check everything it does. You can’t just let it loose and trust everything it does.
A computer is a machine that makes human errors at the speed of electricity.
I think one of the big issues is it often makes nonhuman errors. Sometimes I forget a semicolon or there’s a typo, but I’m well equipped to handle that. In fact, most programs can actually catch that kind of issue already. AI is more likely to generate code that’s hard to follow and therefore harder to check. It makes debugging more difficult.
AI is more likely to generate code that’s hard to follow and therefore harder to check.
Sure. It’s making the errors faster and at a far higher volume than any team of humans could do in twice the time. The technology behind inference is literally an iterative process of turning gibberish into something that resembles human text. So its sort of a speed run from baby babble into college level software design by trial, evaluation, and correction over and over and over again.
But because the baseline comparison code is, itself, full of errors, the estimation you get at the end of the process is going to be scattering errant semicolons (and far more esoteric coding errors) through the body of the program at a frequency equivalent to humans making similar errors over a much longer timeline.
I’ve been coding for a while. I did an honest eager attempt at making a real functioning thing with all code written by AI. A breakout clone using SDL2 with music.
The game should look good, play good, have cool effects, and be balanced. It should have an attractor screen, scoring, a win state and a lose state.
I also required the code to be maintainable. Meaning I should be able to look at every single line and understand it enough to defend its existence.
I did make it work. And honestly Claude did better than expected. The game ran well and was fun.
But: The process was shit.
I spent 2 days and several hundred dollars to babysit the AI, to get something I could have done in 1 day including learning SDL2.
Everything that turned out well, turned out well because I brought years of skill to the table, and could see when Claude was coding itself into a corner and tell it to break up code in modules, collate globals, remove duplication, pull out abstractions, etc. I had to detect all that and instruct on how to fix it. Until I did it was adding and re-adding bugs because it had made so much shittily structured code it was confusing itself.
TLDR; LLM can write maintainable code if given full constant attention by a skilled coder, at 40% of the coder’s speed.
It depends on the subject area and your workflow. I am not an AI fanboy by any stretch of the imagination, but I have found the chatbot interface to be a better substitute for the “search for how to do X with library/language Y” loop. Even though it’s wrong a lot, it gives me a better starting place faster than reading through years-old SO posts. Being able to talk to your search interface is great.
The agentic stuff is also really good when the subject is something that has been done a million times over. Most web UI areas are so well trodden that JS devs have already invented a thousand frameworks to do it. I’m not a UI dev, so being able to give the agent a prompt like, “make a configuration UI with a sidebar that uses the graphql API specified here” is quite nice.
AI is trash at anything it hasn’t been trained on in my experience though. Do anything niche or domain-specific, and it feels like flipping a coin with a bash script. It just throws shit at the wall and runs tests until the tests pass (or it sneakily changes the tests because the error stacktrace repeatedly indicates the same test line as the problem).
Yeah what you say makes sense to me. Having it make a “wrong start” in something new is useful, as it gives you a lot of the typical structure, introduces the terminology, maybe something sorta moving that you can see working before messing with it, etc.
It’s basically just for if you’re lazy and don’t want to write a bunch of boilerplate or hit your keyboard a bunch of times to move the cursor(s) around
This was a very directed experiment at purely LLM written maintainable code.
Writing experiments and proof of concepts, even without skill, will give a different calculation and can make more sense.
Having it write a “starting point” and then take over, also is a different thing that can make more sense. This requires a coder with skill, you can’t skip that.
It would be really interesting to watch a video of this process. Though I’m certain it would be pretty difficult to pull off the editing.
You want to see someone using say, VS Code to write something using say, Claude Code?
There’s probably a thousand videos of that.
More interesting: I watched someone who was super cheap trying to use multiple AIs to code a project because he kept running out of free credits. Every now and again he’d switch accounts and use up those free credits.
That was an amazing dance, let me tell ya! Glorious!
I asked him which one he’d pay for if he had unlimited money and he said Claude Code. He has the $20/month plan but only uses it in special situations because he’ll run out of credits too fast. $20 really doesn’t get you much with Anthropic 🤷
That inspired me to try out all the code assist AIs and their respective plugins/CLI tools. He’s right: Claude Code was the best by a HUGE margin.
Gemini 3.0 is supposed to be nearly as good but I haven’t tried it yet so I dunno.
Now that I’ve said all that: I am severely disappointed in this article because it doesn’t say which AI models were used. In fact, the study authors don’t even know what AI models were used. So it’s 430 pull requests of random origin, made at some point in 2025.
For all we know, half of those could’ve been made with the Copilot gpt5-mini that everyone gets for free when they install the Copilot extension in VS Code.
It’s more I want to see the process of experienced coders explaining the coding mistakes that typical AI coding makes. I have very little experience and see it as a good learning experience. You’re probably right about there being tons of videos like that.
The mistakes it makes depends on the model and the language. GPT5 models can make horrific mistakes though where it randomly removes huge swaths of code for no reason. Every time it happens I’m like, “what the actual fuck?” Undoing the last change and trying usually fixes it though 🤷
They all make horrific security mistakes quite often. Though, that’s probably because they’re trained on human code that is *also" chock full of security mistakes (former security consultant, so I’m super biased on that front haha).
One of the first videos I watched about LLMs, was a journalist who didn’t know anything about programming used ChatGPT to build a javascript game in the browser. He’d just copy paste code and then paste the errors and ask for help debugging. It even had to walk him through setting of VS Code and a git repo.
He said it took him about 4 hours to get a playable platformer.
I think that’s an example of a unique capability of AI. It can let a non-programmer kinda program, it can let a non-Chinese speaker speak kinda Chinese, it’ll let a non-artist kinda produce art.
I don’t doubt that it’ll get better, but even now it’s very useful in some cases (nowhere near enough to justify the trillions of dollars being spent though).
Yeah, I’m not sure the way we allocate resources is justified either, in general. I guess ultimately the problem with AI is that it gives access to skills to capital that they would otherwise have to interact with laborers to get.
Which is funny because you should be able to just copy and paste And combine from maybe two maybe three GitHub pages pretty easily and you learn just as much
I’ll go ahead and file this under “duh”.
I couldn’t program a single line of code if my life depended on it, and I could have told you that.
…is this supposed to be news?
Kinda. It’s a novel technology and one that hasn’t been well analyzed or exhaustively tested.
It’s been tested a lot and the results are that it can’t be trusted at all unless you are already an expert in the thing you’re asking it to “help” you with so you can correct the many mistakes it will make, but it’s slower and, again, is **guaranteed **to make mistakes (hallucinations are built into what techbros are insisting on labeling as “AI”, no matter how many resources you throw at it).
All of this at great environmental and human cost too.
I think his point is that this is less “news”, and more “well, duh”.












