Why LLMs can't really build software

MarcellusDrum@lemmy.ml · 3 months ago

Why LLMs can't really build software

black_flag@lemmy.dbzer0.com · 3 months ago

I think it’s going to require a change in how models are built and optimized. Software engineering requires models that can do more than just generate code.

You mean to tell me that language models aren’t intelligent? But that would mean all these people cramming LLMs in places where intelligence is needed are wasting their time?? Who knew?

Me.

Eager Eagle@lemmy.world · 3 months ago

I have a solution for that, I just need a small loan of a billion dollars and 5 years. #trustmebro

black_flag@lemmy.dbzer0.com · 3 months ago

Only one billion?? What a deal! Where’s my checkbook!?

TuffNutzes@lemmy.world · 3 months ago

The LLM worship has to stop.

It’s like saying a hammer can build a house. No, it can’t.

It’s useful to pound in nails and automate a lot of repetitive and boring tasks but it’s not going to build the house for you - architect it, plan it, validate it.

It’s similar to the whole 3D printing hype. You can 3D print a house! No you can’t.

You can 3D print a wall, maybe a window.

Then have a skilled Craftsman put it all together for you, ensure fit and finish and essentially build the final product.

Nate Cox@programming.dev · edit-2 8 days ago

[This comment has been deleted by an automated system]

TuffNutzes@lemmy.world · 3 months ago

Yeah I’ve seen that before and it’s basically what I’m talking about. Again, that’s not “printing a 3D house” as hype would lead one to believe. Is it extruding cement to build the walls around very carefully placed framing and heavily managed and coordinated by people and finished with plumbing, electrical, etc.

It’s cool that they can bring this huge piece of equipment to extrude cement to form some kind of wall. It’s a neat proof of concept. I personally wouldn’t want to live in a house that looked anything like or was constructed that way. Would you?

Nate Cox@programming.dev · edit-2 8 days ago

deleted by creator

DireTech@sh.itjust.works · 3 months ago

Did you see another video about this? The one linked only showed the walls and still showed them doing interior framing. Nothing about windows, electrical, plumbing, insulation, etc.

What they showed could speed up construction but there are tons of other steps involved.

I do wonder how sturdy it is since it doesn’t look like rebar or anything else is added.

Nate Cox@programming.dev · edit-2 8 days ago

[This comment has been deleted by an automated system]

scarabic@lemmy.world · 3 months ago

I’m with you on this. We can’t just causally brush aside a machine that can create the frame of a house unattended - just because it can’t also do wiring. It was a bad choice of image to use to attack AI. In fact it’s a perfect metaphor for what AI is actually good for: automating certain parts of the work. Yes you still need an electrician to come in, just like you also need a software engineer to wire up the UI code their LLM generated to the back end, etc.

zalgotext@sh.itjust.works · 3 months ago

You circled all the way back to the original point lol. The whole thrust of this conversation is “AI can be used to automate parts of the work, but you still need knowledgeable people to finish it”. Just like “a concrete 3d printer can be used to automate parts of building a house, but you still need knowledgeable people to finish it.”

poopkins@lemmy.world · 3 months ago

Spoken like a person who has never been involved in the construction of a home. It’s effectively doing the job of (poorly) pouring concrete which isn’t the difficult or time consuming part.

Nate Cox@programming.dev · edit-2 8 days ago

deleted by creator

poopkins@lemmy.world · 3 months ago

Ah, my apologies. I had interpreted your message to suggest that pouring cement from a robotic arm fully replaced all of the construction work of framing and finishing all of the walls of the house, interior and exterior, plus attaching them and insulating them, with a single step.

Amju Wolf@pawb.social · 3 months ago

Huh? They just made the walls. Out of cement.

Making the walls of a house is one of the easiest steps, if not the easiest. And these would still need insulation, electrical, etc. And they look like shit.

scarabic@lemmy.world · edit-2 3 months ago

it’s basically what I’m talking about

Well, a minute ago you were saying that AI worship is akin to saying

a hammer can build a house

Now you’re saying that a hammer is basically the same thing as a machine that can create a building frame unattended? Come on. You have a point to be made here but you’re leaning on the stick a bit too hard.

frog_brawler@lemmy.world · 3 months ago

You’re making a great analogy with the 3D printing of a house.

However, if we consider the 3D printed house scenario; that skilled craftsman is now able to do things on his own that he would have needed a team for in the past. Most, if not all, of the less skilled members of that team are not getting any experience within the craft at that point. They’re no longer necessary when one skilled person can now do things on their own.

What happens when the skilled and highly experienced craftsmen that use AI as a supplemental tool (and subsequently earn all the work) eventually retire, and there’s been no juniors or mid-levels for a while? No one is really going to be qualified without having had exposure to the trade for several years.

TuffNutzes@lemmy.world · 3 months ago

Absolutely. This is a huge problem and I’ve read about this very problem from a number of sources. This will have a huge impact on engineering and information work.

Interestingly enough, A similar shortage occurred in the trades when information work was up and coming and the trades were shunned as a career path for many. Now we don’t have enough plumbers and electricians. Trades are now finding their the skills in high demand and charging very high rates.

ChokingHazard@lemmy.world · 3 months ago

The trades problem is a typical small business problem with toxic work environments. I knew plenty that washed out of the trades because of that. The “nobody wants to work anymore” tradesmen but really it’s “nobody wants to work with me for what I’m willing to pay”

TuffNutzes@lemmy.world · edit-2 3 months ago

I don’t doubt that that’s a problem either in some of those small businesses.

I have a great electrician that I call all the time. He’s probably in his late 60s. It’s definitely more of a rough and tumble work environment than IT work, for sure, but he’s a good guy and he pays his people well and he charges me an arm and a leg.

But we talk about it and he tells me about how the same work he would have charged a quarter the price just 10 years ago. And honestly, he’s one of the more affordable ones.

So it definitely seems like the trades is the place to be these days with so few good ones around. But yeah you have to pick and choose who’s mentoring you.

dreadbeef@lemmy.dbzer0.com · edit-2 3 months ago

3d printed concrete houses exist. Why can’t you 3d print a house? Not the best metaphor lol

toddestan@lemmy.world · edit-2 3 months ago

You can certainly 3D print a building, but can you really 3D print a house? Can it 3d print doors and windows that can open and close and be locked? Can it 3D print the plumbing and wiring and have it be safe and functional? Can it 3D print the foundation? What about bathroom fixtures, kitchen cabinets, and things like carpet?

It’s actually not a bad metaphor. You can use a 3D printer to help with building a house, and to 3D print some of fixtures and bits and pieces that go into the house. Using a 3D printer would automate a fair amount of the manual labor that goes into building a house today (at least how it is done in the US). But you’re still going to need people who know what they are doing put it all together to transform the building to a functional home. We’re still a fair ways away from just being able to 3D print a house, just like we’re fair ways away from having a LLM write a large, complex piece of software.

TuffNutzes@lemmy.world · 3 months ago

Exactly this.

Nalivai@lemmy.world · 3 months ago

No they aren’t. With enough setup and very unique and expensive equipment, you can pour shitty concrete walls that will be way more expensive and worse than if you did it normally. That will give you 20% of the house, at best. 20% of not very good of a house.

surewhynotlem@lemmy.world · 3 months ago

You don’t like glass windows? Air conditioning? A door?

Frezik@lemmy.blahaj.zone · 3 months ago

To those who have played around with LLM code generation more than me, how are they at debugging?

I’m thinking of Kernighan’s Law: “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” If vibe coding reduces the complexity of writing code by 10x, but debugging remains just as difficult as before, then Kernighan’s Law needs to be updated to say debugging is 20x as hard as vibe coding. Vibe coders have no hope of bridging that gap.

Ledivin@lemmy.world · 3 months ago

They’re not good at debugging. The article is pretty spot on, IMO - they’re great at doing the work; but you are still the brain. You’re still deciding what to do, and maybe 50% of the time how to do it, you’re just not executing the lowest level anymore. Similar for debugging - this is not an exercise at the lowest level, and needs you to run it.

hisao@ani.social · 3 months ago

deciding what to do, and maybe 50% of the time how to do it, you’re just not executing the lowest level anymore

And that’s exactly what I want. And I don’t get it why people want more. Having more means you have less and less control or influence on the result. What I want is that in other fields it becomes like it is in programming now, so that you micromanage every step and have great control over the result.

very_well_lost@lemmy.world · edit-2 3 months ago

The company I work for has recently mandated that we must start using AI tools in our workflow and is tracking our usage, so I’ve been experimenting with it a lot lately.

In my experience, it’s worse than useless when it comes to debugging code. The class of errors that it can solve is generally simple stuff like typos and syntax errors — the sort of thing that a human would solve in 30 seconds by looking at a stack trace. The much more important class of problem, errors in the business logic, it really really sucks at solving.

For those problems, it very confidently identifies the wrong answer about 95% of the time. And if you’re a dev who’s desperate enough to ask AI for help debugging something, you probably don’t know what’s wrong either, so it won’t be immediately clear if the AI just gave you garbage or if its suggestion has any real merit. So you go check and manually confirm that the LLM is full of shit which costs you time… then you go back to the LLM with more context and ask it to try again. It’s second suggestion will sound even more confident than the first, (“Aha! I see the real cause of the issue now!”) but it will still be nonsense. You go waste more time to rule out the second suggestion, then go back to the AI to scold it for being wrong again.

Rinse and repeat this cycle enough times until your manager is happy you’ve hit the desired usage metrics, then go open your debugging tool of choice and do the actual work.

HubertManne@piefed.social · 3 months ago

maybe its just me but I find typos to be the most difficult because my brain and easily see it as correct so the whole code looks correct. Its like the way you can take the vowels out of sentences and people can still immediately read it.

ganryuu@lemmy.ca · 3 months ago

Probably why they talked about looking at a stack trace, you’ll see immediately that you made a typo in a variable’s name or language keyword when compiling or executing.

wols@lemmy.zip · 3 months ago

The nastiest typos are autocompleted similarly named (and correctly typed) variables, functions, or types. Which is why it’s a good idea to avoid such name clashes in the first place. If this is impossible or not practical, at least put the part that differs at the start of the name.

HubertManne@piefed.social · 3 months ago

Thing is that having the differ part at the end is nicer for sorting.

wols@lemmy.zip · 3 months ago

What do you mean? For what purpose would you sort variables or functions?

HubertManne@piefed.social · 3 months ago

Sorry. I was thinking hostnames or other endpoints and was thinking that way back with typos. dev78usc03 instead of dev78usc02 or such.

HarkMahlberg@kbin.earth · 3 months ago

we must start using AI tools in our workflow and is tracking our usage

Reads to me as “Please help us justify the very expensive license we just purchased and all the talented engineers we just laid off.”

I know the pain. Leadership’s desperation is so thick you can smell it. They got FOMO’d, now they’re humiliated, so they start lashing out.

frog_brawler@lemmy.world · edit-2 3 months ago

Funny enough, the AI shift is really just covering for the over-hiring mistakes in 2021. They can’t admit they fucked up in hiring too many people during Covid, so they’re using AI as the scapegoat. We all know it’s not able to actually replace people yet; but that’s happening anyway.

There won’t be any immediate ramifications, we’ll start to see that in probably 12-18 months or so. It’s just another form of kicking the can down the road.

TrooBloo@lemmy.dbzer0.com · 3 months ago

As it seems to be the case in all of these situations, AI fails hard at tasks when compared to tools specifically designed for that task. I use Ruff in all my Python projects because it formats my code and finds (and often fixes) the kind of low complexity/high probability problems that are likely to pop up as a result of human imperfection. It does it with great accuracy, incredible speed, using very little computing resources, and provides levels of safety in automating fixes. I can run it as an automation step when someone proposes code changes, adding all of 3 or 4 seconds to the runtime. I can run it on my local machine to instantly resolve my ID10T errors. If AI can’t solve these problems as quickly, and if it can’t solve anything more complicated reliably, I don’t understand why it would be a tool I would use.

Pechente@feddit.org · 3 months ago

Definitely not good. Sometimes they can solve issues but you gotta point them in the direction of the issue. Other times they write hacky workarounds that do the job for the moment but crash catastrophically with the next major dependency update.

HarkMahlberg@kbin.earth · 3 months ago

I saw an LLM override the casting operator in C#. An evangelist would say “genius! what a novel solution!” I said “nobody at this company is going to know what this code is doing 6 months from now.”

It didn’t even solve our problem.

hisao@ani.social · 3 months ago

I saw an LLM override the casting operator in C#. An evangelist would say “genius! what a novel solution!” I said “nobody at this company is going to know what this code is doing 6 months from now.”

Before LLMs people were often saying this about people smarter than the rest of the group. “Yeah he was too smart and overengineered solutions that no one could understand after he left,”. This is btw one of the reasons why I increasingly dislike programming as a field over the years and happily delegate the coding part to AI nowadays. This field celebrates conformism and that’s why humans shouldn’t write code manually. Perfect field to automate away via LLMs.

very_well_lost@lemmy.world · 3 months ago

Before LLMs people were often saying this about people smarter than the rest of the group.

Smarter by whose metric? If you can’t write software that meets the bare minimum of comprehensibility, you’re probably not as smart as you think you are.

Software engineering is an engineering discipline, and conformity is exactly what you want in engineering — because in engineering you don’t call it ‘conformity’, you call it ‘standardization’. Nobody wants to hire a maverick bridge-builder, they wanna hire the guy who follows standards and best practices because that’s how you build a bridge that doesn’t fall down. The engineers who don’t follow standards and who deride others as being too stupid or too conservative to understand their vision are the ones who end up crushed to death by their imploding carbon fiber submarine at the bottom of the Atlantic.

AI has exactly the same “maverick” tendencies as human developers (because, surprise surprise, it’s trained on human output), and until that gets ironed out, it’s not suitable for writing anything more than the most basic boilerplate — which is stuff you can usually just copy-paste together in five minutes anyway.

hisao@ani.social · 3 months ago

You’re right of course and engineering as a whole is a first-line subject to AI. Everything that has strict specs, standards, invariants will benefit massively from it, and conforming is what AI inherently excels at, as opposed to humans. Those complaints like the one this subthread started with are usually people being bad at writing requirements rather than AI being bad at following them. If you approach requirements like in actual engineering fields, you will get corresponding results, while humans will struggle to fully conform or even try to find tricks and loopholes in your requirements to sidestep them and assert their will while technically still remaining in “barely legal” territory.

TechLich@lemmy.world · 3 months ago

I feel like this isn’t quite true and is something I hear a lot of people say about ai. That it’s good at following requirements and confirming and being a mechanical and logical robot because that’s what computers are like and that’s how it is in sci fi.

In reality, it seems like that’s what they’re worst at. They’re great at seeing patterns and creating ideas but terrible at following instructions or staying on task. As soon as something is a bit bigger than they can track context for, they’ll get “creative” and if they see a pattern that they can complete, they will, even if it’s not correct. I’ve had copilot start writing poetry in my code because there was a string it could complete.

Get it to make a pretty looking static web page with fancy css where it gets to make all the decisions? It does it fast.

Give it an actual, specific programming task in a full sized application with multiple interconnected pieces and strict requirements? It confidently breaks most of the requirements, and spits out garbage. If it can’t hold the entire thing in its context, or if there’s a lot of strict rules to follow, it’ll struggle and forget what it’s doing or why. Like a particularly bad human programmer would.

This is why AI is automating art and music and writing and not more mundane/logical/engineering tasks. Great at being creative and balls at following instructions for more than a few steps.

hisao@ani.social · 3 months ago

That it’s good at following requirements and confirming and being a mechanical and logical robot because that’s what computers are like and that’s how it is in sci fi.

They’re good at that because they are ANNs.

In reality, it seems like that’s what they’re worst at. They’re great at seeing patterns and creating ideas but terrible at following instructions or staying on task. As soon as something is a bit bigger than they can track context for, they’ll get “creative” and if they see a pattern that they can complete, they will, even if it’s not correct. I’ve had copilot start writing poetry in my code because there was a string it could complete.

Get it to make a pretty looking static web page with fancy css where it gets to make all the decisions? It does it fast.

Give it an actual, specific programming task in a full sized application with multiple interconnected pieces and strict requirements? It confidently breaks most of the requirements, and spits out garbage. If it can’t hold the entire thing in its context, or if there’s a lot of strict rules to follow, it’ll struggle and forget what it’s doing or why. Like a particularly bad human programmer would.

This is why AI is automating art and music and writing and not more mundane/logical/engineering tasks. Great at being creative and balls at following instructions for more than a few steps.

My experience is opposite.

Feyd@programming.dev · 3 months ago

Wow you just completely destroyed any credibility about your software development opinions.

hisao@ani.social · 3 months ago

Why though? I think hating and maybe even disrespecting programming and wanting your job to be as much redundant and replaced as possible is actually the best mindset for a programmer. Maybe in the past it was a nice mindset to become a teamlead or a project manager, but nowadays with AI it’s a mindset for programmers.

Feyd@programming.dev · 3 months ago

Before LLMs people were often saying this about people smarter than the rest of the group. “Yeah he was too smart and overengineered solutions that no one could understand after he left,”.

This part.

Ledivin@lemmy.world · 3 months ago

deleted by creator

hisao@ani.social · 3 months ago

The fact that I dislike it that it turned out that software engineering is not a good place for self-expression or for demonstrating your power level or the beauty and depth of your intricate thought patterns through advanced constructs and structures you come up with, doesn’t mean that I disagree that this is true.

hisao@ani.social · 3 months ago

My first level of debugging is logging things to console. LLMs here do a decent job at “reading your mind” and autocompleting “pri” into something like “println!(“i = {}, x = {}, y = {}”, i, x, y);” with very good context awareness of what and how exactly it makes most sense to debug print in the current location in code.

0x01@lemmy.ml · 3 months ago

I use it extensively daily.

It cannot step through code right now, so true debugging is not something you use it for. Most of the time the llm will take the junior engineer approach of “guess and check” unless you explicitly give it better guidance.

My process is generally to start with unit tests and type definitions, then a large multipage prompt for every segment of the app the llm will be tasked with. Then I’ll make a snapshot of the code, give the tool access to the markdown prompt, and validate its work. When there are failures and the project has extensive unit tests it generally follows the same pattern of “I see that this failure should be added to the unit tests” which it does and then re-executes them during iterative development.

If tests are not available or if it is not something directly accessible to the tool then it will generally rely on logs either directly generated or provided by the user.

My role these days is to provide long well thought out prompts, verify the integrity of the code after every commit, and generally just kind of treat the llm as a reckless junior dev. Sometimes junior devs can surprise you, like yesterday I was very surprised by a one shot result: asking for a mobile rn app for taking my rambling voice recordings and summarize them into prompts, it was immediately remarkably successful and now I’ve been walking around mic’d up to generate prompts.

frog_brawler@lemmy.world · 3 months ago

How are they at debugging? In a silo, they’re shit.

I’ve been using one LLM to debug the other this past week for a personal project, and it can be a bit tedious sometimes, but it eventually does a decent enough job. I’m pretty much vibe coding things that are a bit out of my immediate knowledge and skill set, but I know how they’re supposed to work. For example, I’ve got some python scripts using rekognition to scan photos for porn or other explicit stuff before they get sent to an s3 bucket. After that happens, there’s now a dashboard that’s going to give me results on how many images were scanned and then marked as either acceptable or flagged as inappropriate. After a threshold of too many inappropriate images being sent in, it’ll shadowban them from sending any more dick pics in.

For someone that’s never taken a coding course, I’m relatively happy with the results I’m getting so far. Granted, this may be small potatoes for someone with an actual development background; but as someone that’s been working adjacent to those folks for several years, I’m happy with the output.

sobchak@programming.dev · 3 months ago

I’ve used AI by just pasting code, then asking if there’s anything wrong with it. It would find things wrong with it, but would also say some things were wrong when it was actually fine.

I’ve used it in an agentic-AI (Cursor), and it’s not good at debugging any slightly-complex code. It would often get “stuck” on errors that were obvious to me, but making wrong, sometimes nonsensical changes.

hietsu@sopuli.xyz · edit-2 3 months ago

deleted by creator

demizerone@lemmy.world · 3 months ago

I am working at a big AI company on llm generating code for automation. I’ve had cursor solve a bug that was occuring in prod after prompting and asking questions of the responses. It took a few rounds but it found a really obscure interaction with the app and the host, and it thanked me for the insight. 😀. I deployed the fix and it worked.

The problem I have is I member it solving this bug, and I remember being impressed, but I don’t remember the bug. I took a screenshot of it, but currently don’t have access to those. I am disconnected from the code that the llm has generated, but I am very aware of how the app works and what it should do very intently because I had to write requirements and design doc.

Zexks@lemmy.world · 3 months ago

Working just fine. It one shot a kodi tv channel addon for me last week end. Used it to integrate kofax into docusign. Building 2 blazor apps one new one an upgrade. Used it to create a stack of mc servers for the kids with a dashboard of statuses and control switches. My son is working on his own mc mod with it. Use it almost daily for random file organization and management scripts. Using it to clean uo my media library meta data. Anytime i have to do something to more than 5 or so files i pull it up and ask for a script.

Its a tool like any other. There will be people who adapt and people who fail to. Just like we had with computers the internet. It zeems to be long forgotten now but literally ALL of these anti ai arguments were made against computers and the internet 30_50 years ago. Very similar ones were made when books and writing became common place as well.

Feyd@programming.dev · 3 months ago

“Some random people were wrong about something in the past so nobody is allowed to speculate that any technology isn’t as revolutionary as it’s hyped to be ever again” is not a useful or compelling argument.

TheFinn@discuss.tchncs.de · 3 months ago

Apparently I’m not up to date. I’ve been impressed by some things and turned off by others. But I haven’t seen any workflows or setups that enabled access to my file system. How is that accomplished, and are there any safeguards around it?

Zexks@lemmy.world · 3 months ago

I didnt give it access just had it make sceipts in various languages to handle large repetative file tasks. Something that wpuld take me 30-45 minutes toclpoks up and piece together it can do is 30-45 seconds. And depending on how simple or able you are to describe the task at hand the better it can do. Even when i know what i want to type, like during the blazor conversion it simply types faster for a much simpler prompt. Once i had a single page sorted i asked it for a step by step of what we did. Then took that and said ‘hey do this to the follpwing page abc.html’ and done. Then just tell it ‘now this page …’ etc etc. That was in copilot though so it could see my solutions files.

Feyd@programming.dev · 3 months ago

You’re looking for an MCP server, which is the standard way to hook things into chatbots now, and safeguards would depend on the particular server.

isaacd@lemmy.world · 3 months ago

Clearly LLMs are useful to software engineers.

Citation needed. I don’t use one. If my coworkers do, they’re very quiet about it. More than half the posts I see promoting them, even as “just a tool,” are from people with obvious conflicts of interest. What’s “clear” to me is that the Overton window has been dragged kicking and screaming to the extreme end of the scale by five years of constant press releases masquerading as news and billions of dollars of market speculation.

I’m not going to delegate the easiest part of my job to something that’s undeniably worse at it. I’m not going to pass up opportunities to understand a system better in hopes of getting 30-minute tasks done in 10. And I’m definitely not going to pay for the privilege.

Feyd@programming.dev · 3 months ago

I don’t use one, and my coworkers that do use them are very loud about it, and worse at their jobs than they were a year ago.

skisnow@lemmy.ca · 3 months ago

I’ve found them useful, sometimes, but nothing like a fraction of what the hype would suggest.

They’re not adequate replacements for code reviewers, but getting an AI code review does let me occasionally fix a couple of blunders before I waste another human’s time with them.

I’ve also had the occasional bit of luck with “why am I getting this error” questions, where it saved me 10 minutes of digging through the code myself.

“Create some test data and a smoke test for this feature” is another good timesaver for what would normally be very tedious drudge work.

What I have given up on is “implement a feature that does X” questions, because it invariably creates more work than it saves. Companies selling “type in your app idea and it’ll write the code” solutions are snake-oil salesman.

frog_brawler@lemmy.world · edit-2 3 months ago

I’m not a “software engineer” but a lot of people that don’t work within tech would probably call me one.

I’m in Cloud Engineering, but came from the sys/network admin and ops side of things rather than starting off in dev or anything like that.

Up until about 5 years ago, I really only knew Powershell and a little bit of bash. I’ve gotten up to speed in a lot of things but never officially learned python, js, go or any other real development language that would be useful to me. I’ve spent way more time focusing on getting good with IaC, and probably more of the SRE type stuff.

In my particular situation, LLMs are incredibly useful. It’s fair to say that I use them daily now. I’ve had it convert bash scripts to python for me very quickly. I don’t know python but now that I’m able to look at a python script next to my bash; I’m picking up on stuff a lot faster. I’m using Lambda way more often as a result.

Also, there’s a lot of mundane filling out forms shit that I delegate to an LLM. I don’t want to spend my time filling out a form that I know no one is actually going to read. F it, I’ll have the AI write a report for an AI. It’s dumb as shit, but that’s the world today.

Phegan@lemmy.world · 3 months ago

I’ve only found two effective uses for them. Every time I tried them otherwise they fell flat and took me longer that it would have to write the code myself.

The first was a greenfield personal project where I let code quality wane since I was the only person maintaining it, and wanted to test LLMs. The other was to write highly repeative data tests where the model can simply type faster than me.

Anything that requires writing code that needs to be maintained by multiple people or systems older than 2 years, it has fallen completely flat. In cases like that I spend more time telling the LLM it is doing it wrong, it would have taken me less time to write the code in the first place. In 95% of cases, I am still faster than an LLM at solving a problem and writing the code.

Aatube@kbin.melroy.org · 3 months ago

https://survey.stackoverflow.co/2025/ai/

47% daily use

mojofrododojo@lemmy.world · 3 months ago

47% daily use

That is NOT what that says. It says 47% of STACK OVERFLOW RESPONDENTS REPORT using AI. That does not represent 47% of devs.

If you go to 4chan and poll of chuds, you’re going to get a high percentage of respondents affirming your query. You went to stackoverflow and asked about AI. Think about the user base.

Aatube@kbin.melroy.org · 3 months ago

thanks but i felt like that’d be obvious from the URL lol. the SO survey is probably the largest sample size we have for this…

…that isn’t outright from an AI company (not that SO doesn’t have AI but they’re still an answers company as opposed to, say, Cursor AI whose main selling point is the AI. even Zed, the company behind the blog linked in the post, has a much higher emphasis on AI) and their sample should be pretty close to all online devs, maybe slightly exclusionary of very experienced ones. SO’s evangelist proportion is not even close to 4chan’s chud proportion; not sure why had the impression needed to name that comparison.

it’s not like Codidact has a dev survey and even if they had one they’d have as much bias as this comment section

jj4211@lemmy.world · 3 months ago

I have been using it a bit, still can’t decide if it is useful or not though… It can occasionally suggest a blatantly obvious couple of lines of code here and there, but along the way I get inundated with annoying suggestions that are useless and I haven’t gotten used to ignoring them.

I mostly work with a niche area the LLMs seem broadly clueless about, and prompt driven code is almost always useless except when dealing with a super boilerplate usage of a common library.

I do know some people that deal with amazingly mundane and common functions and they are amazed that it can pretty much do their jobs, but they never really impressed me before anyway and I wondered how they had a job…

hisao@ani.social · 3 months ago

If my coworkers do, they’re very quiet about it.

Gee, guess why. Given the current culture of hate and ostracism I would never outright say IRL that I like it or use it a lot. I would say something like “yeah, I think it can sometimes be useful when used carefully and I sometimes use it too”. While in reality it would mean that it actually writes 95% of code under my micromanagement.

Feyd@programming.dev · 3 months ago

Wut. At software shops the prevailing atmosphere is that you should use it and broadcast it as much as possible. This person’s experience is not normal

hisao@ani.social · edit-2 3 months ago

Okay, to be fair, my knowledge of the current culture in industry is very limited. It’s mostly impression formed by online conversations, not limited to Lemmy. Last project I worked at it was illegal to use public LLMs because of intellectual property (and maybe even GDPR) concerns. We had a local scope-limited LLM integration though and that one was allowed, but there was literally a single person across multiple departments who used it and it was a “middle” frontend dev and it was only for autocomplete. Backenders wouldn’t even consider it.

Feyd@programming.dev · 3 months ago

deleted by creator

dantheclamman@lemmy.world · 3 months ago

LLMs are useful to provide generic examples of how a function works. This is something that would previously take an hour of searching the docs and online forums, but the LLM can do for very quickly, and I appreciate. But I have a library I want to use that was just updated with entirely new syntax. The LLMs are pretty much useless for it. Back to the docs I go! Maybe my terrible code will help to train the model. And in my field (marine biogeochemistry), the LLM generally cannot understand the nuances of what I’m trying to do. Vibe coding is impossible. And I doubt the training set will ever be large or relevant enough for the vibe coding to be feasible.

corsicanguppy@lemmy.ca · 3 months ago

Vibe coding

The term for that is actually ‘slopping’. Kthx ;-)

Dr. Moose@lemmy.world · 3 months ago

Thats simply not true. LLMs with RAG can easily catch up with new library changes.

Occhioverde@feddit.it · edit-2 3 months ago

Yes and no.

In many cases (like for the Gradle DSL, that even if it can be either the old Groovy-based one or the new Kotlin-based one, you will always be able to find extensive documentation and examples in the wild for both of them) it is sufficient to specify which version you’re using and, as long as this doesn’t get too far in its context window forcing you to repeat it, you are good to go.

But for niche libraries that have recently undergone significant refactors with the majority of the tutorials and examples still built with past versions, they have a huge bias towards the old syntax, making it really difficult - if not impossible - to make them use the new functions (at least for ChatGPT and GitHub Copilot with the “Web search” functionality on).

jj4211@lemmy.world · 3 months ago

Subjectively speaking, I don’t see it so that good a job of being current or priortizing current over older.

While RAG is the way to give LLM a shot at staying current, I just didn’t see it doing that good a job with library documentation. Maybe it can do all right with tweaks like additional properties or arguments, but more structural changes to libraries I just don’t see being handled.

dantheclamman@lemmy.world · 3 months ago

Exactly. It’s an very niche library (tmap for R) and just was completely overhauled. Gemini, chatGPT and Copilot all seem pretty confused and mix up the old and new syntax

Dr. Moose@lemmy.world · 3 months ago

Thats a lot on implementation of the LLM engine . For python or js you can feed the API schema of the entire virtual environment.

Evotech@lemmy.world · 3 months ago

You can’t know without checking though, it may be wrong

Wispy2891@lemmy.world · 3 months ago

Note: this comes from someone that makes a (very good) ide which they only monetize with an AI subscription so it’s interesting to see their take

(They use Claude opus like all the others so the results are similar)

ExLisper@lemmy.curiana.net · 3 months ago

I think AI in you IDE is meant to help you with small things while AI agents are supposed to do development for you. If people will start using AI agents they won’t need IDEs so this take is consistent with their business model.

GreenKnight23@lemmy.world · 3 months ago

in one regard I can understand, they’re running a business and don’t want to be at a disadvantage against their competition.

on the other hand have some conviction for your product, otherwise I will lose confidence that your product is as good as your marketing makes it seem.

jj4211@lemmy.world · edit-2 3 months ago

They are still bullish on LLM, just to augment rather than displace human suggested development.

This perspective is quite consistent with the need for a product that manages prompting/context for a human user and helps the human review and integrate the LLM supplied content in a reasonable way.

If LLM were as useful as some of the fanatics say, you’d just use a generic prompt and it would poop out the finished project. This is by the way the perspective of an executive I talked to not long ago, that he was going to be able to let go of all his “coders” and feed his “insight” directly into a prompt that will do it all for him instead. He is also easily influenced so articles like this can reshape him into a more tenable position, after which he’ll pretend he never thought a generic prompt would be good enough

humanspiral@lemmy.ca · 3 months ago

I’ve done a test of 8 LLMs, on coding. It was using the J language, asking all of them to generate a chess “mate in x solver”

Even the bad models were good at organizing code, and had some understanding of chess, were good at understanding the ideas in their prompts. The bad models were bad mostly on logic. Not understanding indexing/amend on a table, not understanding proper function calling, or proper decomposition of arguments in J. Bad models included copilot and openAI’s 120g open source model. kimi k2 was ok. Sonet 4 the best. I’ve mostly used Qwen 3 245 for better free accessibility than Sonet 4, and the fact that it has a giant context that makes it think harder (slower) and better the more its used on a problem. Qwen 3 did a good job in writing a fairly lengthy chess position scoring function, and then separating it into 2 quick and medium function, incorporating self written library code, and recommending enhancements.

There is a lot to get used to in working with LLMs, but the right ones, can generally help with code writting process. ie. there exists some code outputs which even when wrong, provide a faster path to objectives than if that code output did not exist. No matter how bad the code outputs, you are almost never dumber for having received it, unless perhaps you don’t understand the language well enough to know its bad.

antihumanitarian@lemmy.world · 3 months ago

LLMs have made it really clear when previous concepts actually grouped things that were distinct. Not so long ago, Chess was thought to be uniquely human, until it wasn’t, and language was thought to imply intelligence behind it, until it wasn’t.

So let’s separate out some concerns and ask what exactly we mean by engineering. To me, engineering means solving a problem. For someone, for myself, for theory, whatever. Why do we want to solve the problem, what we want to do to solve the problem, and how we do that often blurred together. Now, AI can supply the how in abundance. Too much abundance, even. So humans should move up the stack, focus on what problem to solve and why we want to solve it. Then, go into detail to describe what that solution looks like. So for example, making a UI in Figma or writing a few sentences on how a user would actually do the thing. Then, hand that off to the AI once you think it’s sufficiently defined.

The author misses a step in the engineering loop that’s important though. Plans almost always involve hidden assumptions and undefined or underdefined behavior that implementation will uncover. Even more so with AI, you can’t just throw a plan and expect good results, the humans need to come back, figure out what was underdefined or not actually what they wanted, and update the plan. People can ‘imagine’ rotating an apple in their head, but most of them will fail utterly if asked to draw it; they’re holding the idea of rotating an apple, not actually rotating the apple, and application forces realization of the difference.

hunnybubny@discuss.tchncs.de · 3 months ago

The author misses a step in the engineering loop that’s important though. Plans almost always involve hidden assumptions and undefined or underdefined behavior that implementation will uncover.

His whole point is two mental models and a model delta. Exactly what you just described.

hisao@ani.social · 3 months ago

I love it how article baits AI-haters to upvote it, even though it’s very clearly pro-AI:

At Zed we believe in a world where people and agents can collaborate together to build software. But, we firmly believe that (at least for now) you are in the drivers seat, and the LLM is just another tool to reach for.

Aatube@kbin.melroy.org · 3 months ago

How is that pro-AI? It clearly very neutrally says it’s just a tool, which you can also hate.

NoiseColor @lemmy.world · 3 months ago

Good article, I couldn’t agree with it more, it’s exactly my experience.

The tech is being developed really fast and that is the main issue when taking about ai. Most ai haters are using the issues we might have today to discredit the while technology which makes no sense to me.

And this issue the article talks about is apparent and whoever solves it will be rich.

However, it’s interesting to think about the issues that come next.

HarkMahlberg@kbin.earth · 3 months ago

It’s true, the tech will get better in the future, we just need to believe and trust the plan.

Same thing with crypto and NFT’s. They were 99% scam by volume, but who wouldn’t love moving their life savings into a digital ecosystem controlled by a handful of rich gambling addicts with no consumer protections? Imagine, you’ll never need to handle dirty paper money ever again, we’ll just put it all in a digital wallet somewhere controlled by someone else coughmastercardcough.

And another thing, we were too harsh on the Metaverse. Sure, spending 8 hours in VR could make you vomit, and the avatars made ET for the Atari look like Uncharted 4, but it was just in its infancy!

I too want to outsource all my critical thinking to a chatbot controlled by an wealthy insular narcissist who throws Nazi salutes. The technology just needs time to mature. Who knows, maybe it can automate the exile of birthright citizens for us too!

/s

NoiseColor @lemmy.world · 3 months ago

That’s exactly the hyperbole I was talking about. Your post is full of obvious fallacies, but the fact that you are pushing everything to the absolutes is the silliest one.

Aceticon@lemmy.dbzer0.com · 3 months ago

Your whole point is discounting the experience of 50 years in technological evolution (that all technological branches invariably slow down and stop improving) and the last 20 years of hype in Tech (literally everything is pushed like crazy as “the next big thing” by people trying to make a lot of money from it, and almost all of it isn’t), so that specific satirical take on your post is well deserved.

NoiseColor @lemmy.world · 3 months ago

Satirical? That didn’t fit the description of the word at all. Your should check a dictionary.

All technological branches invariably slow down? Ever heard of Moore’s law? I’m just gonna stop here and not talk to you again. It’s clear, you just want a conflict and I don’t think you have much else to offer. Bye.

Aceticon@lemmy.dbzer0.com · edit-2 3 months ago

Like the guy whose baby doubled in weight in 3 months and thus he extrapolated that by the age of 10 the child would weigh many tons, you’re assuming that this technology has a linear rate of improvement of “intelligence”.

This is not at all what’s happening - the evolution of things like LLMs in the last year or so (say between GPT4 and GPT5) is far less than it was earlier in that Tech and we keep seeing more and more news on problems about training it further and getting it improved, including the big one which is that training LLMs on the output of LLMs makes them worse, and the more the output of LLMs is out there, the harder it gets to train new iteractions with clean data.

(And, interestingly, no Tech has ever had a rate of improvement that didn’t eventually tailed of, so it’s a peculiar expectation to have for a specific Tech that it will keep on steadily improving)

With this specific path taken in implementing AI, the question is not “when will it get there” but rather “can it get there or is it a technological dead-end”, and at least for things like LLMs the answer increasingly seems to be that it is a technological dead-end for the purpose of creating reasoning intelligence and doing work that requires it.

(For all your preemptive defense by implying that critics are “ai haters”, no hate is required to do this analysis, just analytical ability and skepticism, untainted by fanboyism)

NoiseColor @lemmy.world · 3 months ago

The difference here is that the current ai tech advancements are not just a consequence of one single tech, but of many.

Everything you wrote you believe, depends on this being one tech, one dead end.

The real situation is that we finally have the hardware and the software to make breakthroughs. There is no dead end to this. It’s just a series of steps, each contributing by itself and by learning from its mass implementations. It’s like we got he first taste of ai and we can’t get enough. Even if it takes a while to the next advancement.

Aceticon@lemmy.dbzer0.com · edit-2 3 months ago

That doesn’t even make sense - it’s not merely the there being multiple elements which add up to a specific tech that makes it capable of reaching a specific goal, just like throwing multiple ingredients into a pot doesn’t guarantee you a tasty dish as output and you have absolutely no proof that “we finally have the hardware and the software to make breakthroughs” hence you can’t anchor the forecast that the stuff done on top of said hardware and software will achieve a great outcome entirely anchored on your assertion that “it’s made up from stuff which can do greatness”.

As for the tech being a composition of multiple tech elements, that doesn’t mean much: most dishes too are a composition of multiple elements and that doesn’t mean that any random combination of stuff thrown into a pot will make a good dish.

That idea that more inputs make a specific output more likely is like claiming that “the chances of finding a needle increase with the size of the haystack” - the very opposite of reality.

Might want to stop using LLMs to write your responses and engage your brain instead.

NoiseColor @lemmy.world · 3 months ago

Ah, there go the insults. Surely the best way to display the superiority of your argument lol. And show who is the rational one in any conversation. But I’ll let the first one side, ok. Anyone can have a weak moment. For sure I had many.

My post has sense. You can claim ,as you have, that multiple ingredients don’t guarantee a tasty dish and fair enough, but in the other hand the opposite is also obviously not true. So I claim that’s not an argument against what I said by logic itself. I can also say that’s not a good comparison. We have a technology that is already giving us results. You can claim they aren’t good, but considering how many people use it already, that by itself could refute that claim, without mentioning any case studies which are plenty.

To the meat of the thing. Maybe I can’t claim that we are headed for an ai nirvana, but the same you can’t say LLMs are in any kind of dead end, especially not one that will mean ai stagnation for the medium future. But I can safely claim we are far closer than we were 3 years ago, by many orders of magnitude. The reasons being exactly hardware and LLMs. And this is exactly the reason for investments in the very the same tech, infrastructure, companies, institutions, universities, (…), that would invent new technology in AI. So, in the worst case scenario for the llms, they have accelerated the investments and improved the infrastructure for future inventions. Worst case.

Got more insults?

Aceticon@lemmy.dbzer0.com · 3 months ago

Sure mate, your logic is flawless and you’re not at all pretty much just using falacies and axiomatic statements to make the case that “this is going to be the greatest thing ever (invest now!)” like all the other types selling their book on some tech hype as has become common since the 90s and anybody pointing this out is really just insulting you by not accepting your clear genius.

Life must be hard for the benevolent AI Investor just trying to share with others how the tech domain they’re invested in is CERTAIN to become the greatest thing ever because it’s made on top of elements which are CERTAIN to be the elements that will one day deliver the greatest thing ever, only to get insulted by people daring to point out that all that certainty isn’t backed by anything but “trust me”.

NoiseColor @lemmy.world · 3 months ago

Youre insults are so great that I know now you have bested me. Incredible debate strategy to ignore all the arguments and go straight for the jugular with personal attacks. Truly remarkable rhetorical capabilities! I salute you! Have a nice day. Bye.

wulrus@lemmy.world · edit-2 3 months ago

Interesting what he wrote about LLMs’ inability to “zoom out” and see the whole picture. I use Gemini and ChatGPT sometimes to help debug admin / DevOps problems. It’s a great help for extra input, a bit like rubberducking on steroids.

Examples how it went:

Problem: Apache-cluster and connected KeyCloak-Cluster, odd problems with loginflow. Reducing KeyCloak to 1 node solves it, so it says that we need to debug node communication and how to set the debug log settings. A lot of analysis together. But after a while, it’s pretty obvious that the Apache-cluster doesn’t use the sticky session correctly and forwards requests to the wrong KeyCloak node in the middle of the login flow. LLM does not see that, wanted to continue to dig deeper and deeper into supposedly “odd” details of the communication between KeyCloak nodes, althought the combined logs of all nodes show that the error was in load balancing.

Problem: Apache from a different cluster often returns 413 (payload too large). Indeed it happens with pretty large requests, the limit where it happens is a big over 8kB without the body. But the incoming request is valid. So I ask both Gemini and ChatGPT for a complete list of things that cause Apache to do that. It does a decent job at that. And one of it is close: It says to check for mod_proxy_ajp use, since that observed limit could be caused by trying to make an AJP package to communicate with backchannel servers. It was not the cause; the actual mod was mod_jk, which also uses AJP. It helped me focus on watching out for anything using AJP when reviewing the whole config manually, so I found it, and the “rubberducking” helped indirectly. But the LLM said we must forget about AJP and focus on other possible causes - a dead end. When I told it the solution, it was like: Of course mod_jk. (413 sounds like the request TO the apache is wrong, but actually, it tries internally to create an invalid AJP package over 8kB, and when it fails blames the incoming request.)

TankovayaDiviziya@lemmy.world · 3 months ago

I don’t work in IT, but I do know you need creativity to work in the industry, something which the current LLM/AI doesn’t possess.

Linguists also dismiss LLMs in similar vein because LLMs can’t grasp context. It is always funny to be sarcastic and ironic on an LLM.

Soft skills and culture are what that the current iteration of LLMs lack. However, I do think there is still huge potential for AI development in dacades to come, but I want this AI bubble to burst as “in your face” to companies.

Nighed@feddit.uk · 3 months ago

The idea of the mental model CAN be done by AI.

In my experience, if you get it to build a requirements doc first, then ask it to implement that while updating it as required (effectively it’s mental state). you will get a pretty good output with decent ‘debugging’ ability.

This even works ok with the older ‘dumber’ models.

That only works when you have a comprehensive set of requirements available though. It works when you want to add a new screen/process (mostly) but good luck updating an existing one! (I haven’t tried getting it to convert existing code to a requirements doc - anyone tried that?)

flop_leash_973@lemmy.world · 3 months ago

I tried feeding ChatGPT a Terraform codebase once and asked it to produce an architecture diagram of what the code base would deploy to AWS.

It got most of the little blocks right for the services that would get touched. But the layout and traffic direction flow between services was nonsensical.

Truth be told it did do a better job than I thought it would initially.

Nighed@feddit.uk · 3 months ago

The trick is to split up the tasks into chunks.

Ask it to identify the blocks.

Then ask it to identify the connections.

Then ask it to produce the diagram.

Pumasuedeblue@sh.itjust.works · 3 months ago

Which means you just did four things to help the AI which the AI can’t do itself. That makes it a tool: useful in some applications, not useful in others, and constantly requiring a human to properly utilize it.

SugarCatDestroyer@lemmy.world · edit-2 3 months ago

Well, they will simply fire many and leave the required number of workers to work with AI. This is exactly what they will want to do at any convenient opportunity. But those who remain will still have to check everything carefully in case the AI made a mistake somewhere.

Why LLMs can't really build software

Why LLMs can't really build software

Why LLMs Can't Really Build Software - Zed Blog