You must log in or register to comment.
I can understand objecting to it if it’s used as training data, but it sounds like this is basically just “indexing” the contents of the book, similar to how a search engine works.
The problem is that LLM outputs cannot be constrained to only factual information and only information about the book. For example, say a lot of reddit comments falsely or jokingly claim that the reason a certain plot point in a book was that the author was smoking crack, and not a lot else was written on the subject. The LLM may then be influenced by this in its training corpus to answer the question “Why did the author write this scene?” With “Because he was smoking crack!” And there’s nothing really that anyone can do to prevent it 100% of the time.
