Bug: If multiple searches are performed for a query, the sources for the Assistant might contain duplicates.
Caveat: Since I cannot see the sources and their text snippets directly, I am relying here on the model's judgment. Since three searches were performed in the example below and the highest observed duplicate source had three entries, I assume that this bug is due to the multiple searches that were performed, but I haven't done more testing.
Link to sample thread: https://kagi.com/assistant/221f5d62-addd-4310-ac52-d19698bdff5b
Model: Qwen 3 235b with Thinking
Detailed Description
In the thread above, three searches are performed for the initial query:
Searching with Kagi:
Deep Think Gemini 2.5 Pro
Gemini 2.5 Pro features
Gemini 2.5 Pro capabilities
When asked how many sources were provided, the model starts to ponder within the thinking block whether some sources count as distinct sources:
I then directly asked the model to confirm that the URLs were exactly the same, which it confirmed.
Finally, I asked the model whether the provided text snippets were also the same. According to the model, at least some of the duplicate URLs also have identical (or nearly identical) text snippets.
Since the Assistants' answer quality often strongly hinges on the availability of good sources, it would be desirable if text snippets were deduplicated and additional text snippets could move up into the top 30 of search results.
In addition, it might be a good idea to not only look for exact duplicates, but also for semantic duplicates that might have different wording but carry the same informational content, based on some measure like cosine similarity, if embeddings for the text snippets already exist.