Kagi is using a large language model which appears to have been trained on content that may not have been used with permission. I found this example, where the text generated by the LLM matches exactly a sentence in a copyrighted book.
Steps to reproduce:
- Search for “Layman suggests focusing on a handful of important amino acids”.
- The Kagi LLM returns this text (screenshot 1, below):
... but for those of us with day jobs, Layman suggests focusing on a handful of important amino acids, such as leucine, lycine, and methionine.
- This text is identical (including the misspelling of “lycine”) to that found on page 260 of Outlive: The Science and Art of Longevity by Peter Attia, MD (see screenshot 2, below).
Expected behavior:
LLM should not be trained on unlicensed content?
Debug info:
Edge/macOS/International/US-EAST
Screenshots:
Search result:
Copyrighted book page showing identical contents:
This is either a bug, or a prompt for a serious discussion. 🫤