Re: Kagi Assistant handling of PDF inputs:
Using Gemini's document processing instead of sending PDF raw text results in superior document analysis. This includes more intelligent intepretation of text, and native understanding of visual elements like graphs or charts.
See here: https://ai.google.dev/gemini-api/docs/document-processing
I suggest changing how PDFs are handled with Gemini models, to rather send the document encoded in base64 as inline data. Instead of as extracted text. Gemini processes PDFs natively at a fixed token cost of 258 tokens per page.
For this, documents require no OCR to be done, since gemini interprets the image natively.
Here is Gemini 2.5 Pro via Kagi interpreting a PDF:
https://kagi.com/assistant/37beff5a-400e-4b33-b55a-56cb6a653495
It is unable to correctly identify the type of figure, or the colours used. It hallucinates incorrect information.
Via aistudio, Gemini 2.5 Pro produces the following response:
"Bars are grouped by year. Within each year, there are two bars: Male (left) and Female (right).
Each bar is stacked:
The darker portion (green for males, orange for females) represents Healthy life expectancy at birth (HALE).
The total height of the bar (including the lighter portion on top) represents the total Life expectancy at birth."
For reference, the PDF used is page 14 of 96 (cleanly split) from this document: https://iris.who.int/bitstream/handle/10665/376869/9789240094703-eng.pdf#page=14
(For further comparison, taking a screenshot of that page and 'printing as pdf' further demonstrates Gemini's superior handling.)