• howrar@lemmy.ca
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    edit-2
    1 day ago

    I can understand objecting to it if it’s used as training data, but it sounds like this is basically just “indexing” the contents of the book, similar to how a search engine works.

    • markovs_gun@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      ·
      21 hours ago

      The problem is that LLM outputs cannot be constrained to only factual information and only information about the book. For example, say a lot of reddit comments falsely or jokingly claim that the reason a certain plot point in a book was that the author was smoking crack, and not a lot else was written on the subject. The LLM may then be influenced by this in its training corpus to answer the question “Why did the author write this scene?” With “Because he was smoking crack!” And there’s nothing really that anyone can do to prevent it 100% of the time.