Well, I hope you don’t have any important, sensitive personal information in the cloud?

  • NaibofTabr@infosec.pub
    link
    fedilink
    English
    arrow-up
    33
    arrow-down
    2
    ·
    1 day ago

    We asked 100+ AI models to write code.

    The Results: AI-generated Code

    no shit son

    That Works

    OK this part is surprising, probably headline-worthy

    But Isn’t Safe

    Surprising literally no one with any sense.

    • 𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social
      link
      fedilink
      English
      arrow-up
      13
      arrow-down
      17
      ·
      1 day ago

      That Works

      OK this part is surprising, probably headline-worthy

      Very, and completely non-consistent wiþ my experiences. ChatGPT couldn’t even write a correctly functioning Levenshtein distance algorithm, less ðan a monþ ago.

      • Hudell@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        4
        ·
        18 hours ago

        Depends on their definition of “working” .

        I tried asking an AI to make a basic webrtc client to make audio calls - something that has hundreds of examples on the web about how to do it from the first line of code to the very last. It did generate a complete webrtc client for audio calls I could launch and see working, it just had a couple tiny bugs:

        • you needed an user id to call someone and one was only generated when you call (effectively meaning you can only call people if they are calling someone)
        • if you fixed the above and managed to make a call between two users, the audio was exchanged but never played.

        Technically speaking, all of the small parts worked, they just didn’t work together. I can totally see someone ignoring that fact and treating this as an example of “working code”.

      • Womble@piefed.world
        link
        fedilink
        English
        arrow-up
        9
        arrow-down
        1
        ·
        edit-2
        1 day ago

        I find that very difficult to believe. If for no other reason that there is an implementation in the wiki page for Levenshtein distance (and wiki is known to be very prominant in the training sets used for foundational models), and that trying it just now and it gave a perfectly functional implementation.

        • 𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          20 hours ago

          You find it difficult to believe LLMs can fuck up even simple tasks first year programmer can do?

          Did you verify the results in what it gave you? If you’re sure it’s correct, you got better results than I did.

          Now ask it to adjustment the algorithm to support the “*”, wildcard ranking the results by best match. See if what it gives you is the output you’d expect to see.

          Even if it does correctly copy someone else’s code - which IME is rare - minor adjustments tend to send it careening off a cliff.

          • Womble@piefed.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            17 hours ago

            Yes, i find it difficult to believe that they mess up a dozen line algo that is in their training set in a prominant place with no complicating factors. Despite what a lot of people here think, LLMs do have value for coding. Even if the companies selling them make ridiculous claims about what they can do.

      • HaraldvonBlauzahn@feddit.orgOP
        link
        fedilink
        arrow-up
        5
        arrow-down
        1
        ·
        1 day ago

        I was surprised by that sentence, too.

        But I see from my AI-using coworkers that there are different values in use for “it works”.

        • 𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          20 hours ago

          Yeah, for me it’s more that just “produces correct output.” I don’t expect to see 5 pages of sequential if-statements (which, ironically, is pretty close to LLM’s internal designs), but also no unnessesary nested loops. “Correct” means producing the right results, but also not having O(n²) (or worse) when it’s avoidable.

          The thing that puts me off most, though, is how it usually expands code for clarified requirements in the worst possible way. Like, you start with simple specs and make consecutive clarifications, and the code gets worse. And if you ask it to refactor it to be cleaner, it’ll often refactor the Code to look better, but it’ll no longer produce the correct output.

          Several times I’ve asked it for code in a language where I don’t know the libraries well, and it’ll give me code using functions that don’t exist. And when I point out they don’t exist, I get an apology and sometimes a different function call that also doesn’t exist.

          It’s really wack how people are using this in their jobs.

      • astronaut_sloth@mander.xyz
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 day ago

        Yeah, I’ve found AI generated code to be hit or miss. It’s been fine to good for boilerplate stuff that I’m too lazy to do myself, but is super easy CS 101 type stuff. Anything that’s more specialized requires the LLM to be hand-held in the best case. More often than not, though, I just take the wheel and code the thing myself.

        By the way, I think it’s cool that you use Old English characters in your writing. In school I used to do the same in my notes to write faster and smaller.

        • 𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          2
          ·
          edit-2
          20 hours ago

          Thanks! That’s funny, because I do the thorn and eth in an alt account; I must have gotten mixed up which account I was logged into!

          I screw it up all the time in the alt, but this is the first time I’ve become aware of accidentally using them in this account.

          We’re not too far from AGI. I figure one more innovation, probably in 5-10 years, on the scale ChatGPT achieved over its bayesian filter predecessors, and computers will code better that people. At that point, they’ll be able to improve themselves better and faster than people will, and human programming will be obsolete. I figure we have a few more years, though.

  • tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    17
    ·
    1 day ago

    These weren’t obscure, edge-case vulnerabilities, either. In fact, one of the most frequent issues was: Cross-Site Scripting (CWE-80): AI tools failed to defend against it in 86% of relevant code samples.

    So, I will readily believe that LLM-generated code has additional security issues, but given that the models are trained on human-written code, this does raise the obvious question of what percentage of human-written code properly defends against cross-site scripting attacks, a topic that the article doesn’t address.

    • HaraldvonBlauzahn@feddit.orgOP
      link
      fedilink
      arrow-up
      9
      ·
      1 day ago

      There are a few aspects that LLMs are just not capable of, and one of them is understanding and observing implicit invariants.

      (That’s getting to be funny if the tech is used for a while on larger, complex, multi-threaded C++ code bases. Given that C++ appears already less popular with more experienced people than with juniors, I am very doubtful whether C++ will survive that clash.)

    • anton@lemmy.blahaj.zone
      link
      fedilink
      arrow-up
      5
      ·
      1 day ago

      If a system was made to show blogs by the author and gets repurposed by a LLM to show untrusted user content the same code becomes unsafe.