Just want to clarify, this is not my Substack, I’m just sharing this because I found it insightful.

The author describes himself as a “fractional CTO”(no clue what that means, don’t ask me) and advisor. His clients asked him how they could leverage AI. He decided to experience it for himself. From the author(emphasis mine):

I forced myself to use Claude Code exclusively to build a product. Three months. Not a single line of code written by me. I wanted to experience what my clients were considering—100% AI adoption. I needed to know firsthand why that 95% failure rate exists.

I got the product launched. It worked. I was proud of what I’d created. Then came the moment that validated every concern in that MIT study: I needed to make a small change and realized I wasn’t confident I could do it. My own product, built under my direction, and I’d lost confidence in my ability to modify it.

Now when clients ask me about AI adoption, I can tell them exactly what 100% looks like: it looks like failure. Not immediate failure—that’s the trap. Initial metrics look great. You ship faster. You feel productive. Then three months later, you realize nobody actually understands what you’ve built.

  • phed@lemmy.ml
    link
    fedilink
    English
    arrow-up
    14
    ·
    12 hours ago

    I do a lot with AI but it is not good enough to replace humans, not even close. It repeats the same mistakes after you tell it no, it doesn’t remember things from 3 messages ago when it should. You have to keep re-explaining the goal to it. It’s wholey incompetant. And yea when you have it do stuff you aren’t familiar with or don’t create, def. I have it write a commentary, or I take the time out right then to ask it what x or y does then I add a comment.

    • kahnclusions@lemmy.ca
      link
      fedilink
      English
      arrow-up
      6
      ·
      edit-2
      41 minutes ago

      Even worse, the ones I’ve evaluated (like Claude) constantly fail to even compile because, for example, they mix usages of different SDK versions. When instructed to use version 3 of some package, it will add the right version as a dependency but then still code with missing or deprecated APIs from the previous version that are obviously unavailable.

      More time (and money, and electricity) is wasted trying to prompt it towards correct code than simply writing it yourself and then at the end of the day you have a smoking turd that no one even understands.

      LLMs are a dead end.

      • MangoCats@feddit.it
        link
        fedilink
        English
        arrow-up
        1
        ·
        24 minutes ago

        constantly fail to even compile because, for example, they mix usages of different SDK versions

        Try an agentic tool like Claude Code - it closes the loop by testing the compilation for you, and fixing its mistakes (like human programmers do) before bothering you for another prompt. I was where you are at 6 months ago, the tools have improved dramatically since then.

        From TFS > I needed to make a small change and realized I wasn’t confident I could do it. My own product, built under my direction, and I’d lost confidence in my ability to modify it.

        That sounds like a “fractional CTO problem” to me (IMO a fractional CTO is a guy who convinces several small companies that he’s a brilliant tech genius who will help them make their important tech decisions without actually paying full-time attention to any of them. Actual tech experience: optional.)

        If you have lost confidence in your ability to modify your own creation, that’s not a tools problem - you are the tool, that’s a you problem. It doesn’t matter if you’re using an LLM coding tool, or a team of human developers, or a pack of monkeys to code your applications, if you don’t document and test and formally develop an “understanding” of your product that not only you but all stakeholders can grasp to the extent they need to, you’re just letting the development run wild - lacking a formal software development process maturity. LLMs can do that faster than a pack of monkeys, or a bunch of kids you hired off Craigslist, but it’s the exact same problem no matter how you slice it.

    • Echo Dot@feddit.uk
      link
      fedilink
      English
      arrow-up
      5
      ·
      10 hours ago

      There’s no point telling it not to do x because as soon as you mention it x it goes into its context window.

      It has no filter, it’s like if you had no choice in your actions, and just had to do every thought that came into your head, if you were told not to do a thing you would immediately start thinking about doing it.

      • MangoCats@feddit.it
        link
        fedilink
        English
        arrow-up
        1
        ·
        22 minutes ago

        There’s no point telling it not to do x because as soon as you mention it x it goes into its context window.

        Reminds me of the Sonny Bono high speed downhill skiing problem: don’t fixate on that tree, if you fixate on that tree you’re going to hit the tree, fixate on the open space to the side of the tree.

        LLMs do “understand” words like not, and don’t, but they also seem to work better with positive examples than negative ones.

      • kahnclusions@lemmy.ca
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        39 minutes ago

        I’ve noticed this too, it’s hilarious(ly bad).

        Especially with image generation. “Draw a picture of an elf.” Generates images of elves that all have one weird earring. “Draw a picture of an elf without an earing.” Great now the elves have even more earrings.

        • MangoCats@feddit.it
          link
          fedilink
          English
          arrow-up
          1
          ·
          20 minutes ago

          I find this kind of performance to vary from one model to the next. I definitely have experienced the bad image getting worse phenomenon - especially with MS Copilot - but different models will perform differently.