PDF queries are shared between models/chats

fxgn

I tried to use Gemini 2.5 Pro to summarize a pdf chapter by chapter. Unfortunately, in the "gathering details from pdf" step it hallucinated book chapters, and started making stuff up.

So I decided to try a different model (o4 mini). Surprisingly, for the "gathering details" steps, its thoughts were word for word the same, with the exact same hallucinated chapters.

PDF data extraction should actually use the chosen model
It shouldn't be cached between threads

fxgn

(Somewhat related to https://kagifeedback.org/d/6801-gemini-use-document-processing-for-much-better-pdf-handling/12, but still separate)

MattM

The gathering key details step is a tool ("the librarian") that's called by the top level model. It uses a different (read: cheaper and faster) model with a long context window (gemini 2.5 flash lite at the moment)

The reason you see the same output is that the same PDF is being processed by the same model (gemini 2.5 flash lite) with the same config (temperature, top_p, etc.) so the output is the same.

You wouldn't want the top level model to run the librarian tool that's used in the gathering key details step -- it goes through A LOT of tokens and letting random models do that would lead users to run into unpredictable cost explosions or latency spikes when facing the mass of tokens from a 50page + PDF.

TL;DR -- the PDF output is not cached, it's just using the same submodel for the librarian with the same config, so the output is the same for the same input.

fxgn

MattM in that case this is more related to the issue I linked, which is that for models with built in pdf processing (like Gemini), that should be used instead of a custom tool. I tried using the official Gemini 2.5 Pro to summarize the same book and it did it MUCH better than in Kagi assistant, and didn't hallucinate anything.