Provide context information like length and type for summarized URLs

mhersh

I would love to see additional context in the assistant when summarizing and discussing URLs.

Currently, it is difficult to know whether the assistant has read the entire content of the URL I've provided. Because many web sites use tricks to deliver different content to different visitors, or simply because they have bot-hostile designs, I often find that there is a disconnect between what the assistant has access to and what I see when I load the page myself. Because of this, it is difficult to trust that the assistant's summaries are valid. This is especially true with long documents, where it is possible to make a convincing-sounding summary with only a fraction of the actual content.

Here are a few ideas for additional information that could be displayed in order to help me understand what the assistant has ingested and allow me to react accordingly:

Content size. This could be in tokens, words, bytes, or any convenient measurement. If I load a large page and see a small size, I will know something is amiss.
Source metadata. I don't know enough of the inner workings of the assistant to say how this would ideally work, but perhaps it could include information on which file types (e.g. video, audio, html, plain text) were imported, and how they was converted to text for the LLM (e.g. OCR, embedded subtitle track, audio speech-to-text, or whatever other tricks Kagi uses).
Any additional sources collected automatically by the assistant, like other URLs, if applicable.
Ideally, the entire text that's been fed into the LLM. I can prod it to tell me a little bit about what it's read, but I never know if it's giving me real information or just making stuff up.

As an example, let's enter https://www.nature.com/articles/s44220-023-00188-9 into the assistant.

When I load that page in my browser, I only see the Abstract and references, not the full text, unless I log in. The assistant returns 10 bullet-points that seem to include information I do not see myself. Does this mean Kagi can read entire Nature articles? For example, one bullet point says: "Feedback learning and fictive error signals during decision-making tasks engage the striatum and involve nicotine-related modulation in smokers." The only mentions of "fictive error" are in the references, but they do not support that statement. Where did this come from? Did it read the full paper? Did it load the referenced paper? Did it hallucinate? Currently, the assistant gives me no effective way to answer that question. In this case, it did not provide any citations, either.

On the flip side, sometimes I see more information than the assistant sees. See my previous bug report about sciencedirect.com here: https://kagifeedback.org/d/2649-summarizer-only-reads-abstract-instead-of-full-text-on-sciencedirectcom , where I see the full paper but the assistant only has access to the abstract.

Sometimes I use the summarizer to read Twitter threads, and god only knows what dark magicks are required to turn Twitter into something usable. I highly doubt that the bot sees the same thing I see.

Luis

mhersh Hey! We're now showing the number of words Assistant processes when you submit a webpage or document. This update occurs right after the "processing" phase,

We're actively working in improving Assistant, and will continue exploring how to make it clearer what's the content it uses to generate answers. Thank you for your feedback - please let us know if think this is a step in the right direction!