I Went All-In on AI. The MIT Study Is Right.

AutistoMephisto@lemmy.world · edit-2 2 days ago

I Went All-In on AI. The MIT Study Is Right.

phed@lemmy.ml · 12 hours ago

I do a lot with AI but it is not good enough to replace humans, not even close. It repeats the same mistakes after you tell it no, it doesn’t remember things from 3 messages ago when it should. You have to keep re-explaining the goal to it. It’s wholey incompetant. And yea when you have it do stuff you aren’t familiar with or don’t create, def. I have it write a commentary, or I take the time out right then to ask it what x or y does then I add a comment.

kahnclusions@lemmy.ca · edit-2 41 minutes ago

Even worse, the ones I’ve evaluated (like Claude) constantly fail to even compile because, for example, they mix usages of different SDK versions. When instructed to use version 3 of some package, it will add the right version as a dependency but then still code with missing or deprecated APIs from the previous version that are obviously unavailable.

More time (and money, and electricity) is wasted trying to prompt it towards correct code than simply writing it yourself and then at the end of the day you have a smoking turd that no one even understands.

LLMs are a dead end.

MangoCats@feddit.it · 24 minutes ago

constantly fail to even compile because, for example, they mix usages of different SDK versions

Try an agentic tool like Claude Code - it closes the loop by testing the compilation for you, and fixing its mistakes (like human programmers do) before bothering you for another prompt. I was where you are at 6 months ago, the tools have improved dramatically since then.

From TFS > I needed to make a small change and realized I wasn’t confident I could do it. My own product, built under my direction, and I’d lost confidence in my ability to modify it.

That sounds like a “fractional CTO problem” to me (IMO a fractional CTO is a guy who convinces several small companies that he’s a brilliant tech genius who will help them make their important tech decisions without actually paying full-time attention to any of them. Actual tech experience: optional.)

If you have lost confidence in your ability to modify your own creation, that’s not a tools problem - you are the tool, that’s a you problem. It doesn’t matter if you’re using an LLM coding tool, or a team of human developers, or a pack of monkeys to code your applications, if you don’t document and test and formally develop an “understanding” of your product that not only you but all stakeholders can grasp to the extent they need to, you’re just letting the development run wild - lacking a formal software development process maturity. LLMs can do that faster than a pack of monkeys, or a bunch of kids you hired off Craigslist, but it’s the exact same problem no matter how you slice it.

Echo Dot@feddit.uk · 10 hours ago

There’s no point telling it not to do x because as soon as you mention it x it goes into its context window.

It has no filter, it’s like if you had no choice in your actions, and just had to do every thought that came into your head, if you were told not to do a thing you would immediately start thinking about doing it.

MangoCats@feddit.it · 22 minutes ago

There’s no point telling it not to do x because as soon as you mention it x it goes into its context window.

Reminds me of the Sonny Bono high speed downhill skiing problem: don’t fixate on that tree, if you fixate on that tree you’re going to hit the tree, fixate on the open space to the side of the tree.

LLMs do “understand” words like not, and don’t, but they also seem to work better with positive examples than negative ones.

kahnclusions@lemmy.ca · edit-2 39 minutes ago

I’ve noticed this too, it’s hilarious(ly bad).

Especially with image generation. “Draw a picture of an elf.” Generates images of elves that all have one weird earring. “Draw a picture of an elf without an earing.” Great now the elves have even more earrings.

MangoCats@feddit.it · 20 minutes ago

I find this kind of performance to vary from one model to the next. I definitely have experienced the bad image getting worse phenomenon - especially with MS Copilot - but different models will perform differently.