Lemmy: Bestiverse
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
fubarx@lemmy.world to Programmer Humor@programming.dev · 11 hours ago

Killswitch Engineer

lemmy.world

message-square
60
fedilink
791

Killswitch Engineer

lemmy.world

fubarx@lemmy.world to Programmer Humor@programming.dev · 11 hours ago
message-square
60
fedilink
  • yannic@lemmy.ca
    link
    fedilink
    arrow-up
    8
    ·
    2 hours ago

    Everyone here so far has forgotten that in simulations, the model has blackmailed the person responsible shutting it off and even gone so far as to cancel active alerts in order to prevent an executive laying unconscous in the server room from receiving life-saving care.

    • AwesomeLowlander@sh.itjust.works
      link
      fedilink
      arrow-up
      3
      ·
      edit-2
      18 minutes ago

      The model ‘blackmailed’ the person because they provided it with a prompt asking it to pretend to blackmail them. Gee, I wonder what they expected.

      Have not heard the one about cancelling active alerts, but I doubt it’s any less bullshit. Got a source about it?

      Edit: Here’s a deep dive into why those claims are BS: https://www.aipanic.news/p/ai-blackmail-fact-checking-a-misleading

      • yannic@lemmy.ca
        link
        fedilink
        arrow-up
        2
        ·
        17 minutes ago

        I provided enough information that the relevant source shows up in a search, but here you go:

        In no situation did we explicitly instruct any models to blackmail or do any of the other harmful actions we observe. [Lynch, et al., “Agentic Misalignment: How LLMs Could be an Insider Threat”, Anthropic Research, 2025]

        • AwesomeLowlander@sh.itjust.works
          link
          fedilink
          arrow-up
          2
          ·
          13 minutes ago

          Yes, I also already edited my comment with a link going into the incidents and why they’re absolute nonsense.

Programmer Humor@programming.dev

programmer_humor@programming.dev

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !programmer_humor@programming.dev

Welcome to Programmer Humor!

This is a place where you can post jokes, memes, humor, etc. related to programming!

For sharing awful code theres also Programming Horror.

Rules

  • Keep content in english
  • No advertisements
  • Posts must be related to programming or programmer topics
Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 2.11K users / day
  • 3.95K users / week
  • 8.22K users / month
  • 17.7K users / 6 months
  • 1 local subscriber
  • 28.1K subscribers
  • 1.6K Posts
  • 44.8K Comments
  • Modlog
  • mods:
  • Feyter@programming.dev
  • adr1an@programming.dev
  • BurningTurtle@programming.dev
  • Pierre-Yves Lapersonne@programming.dev
  • BE: 0.19.5
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org