Kagi Assistant fails to parse pdfs which render in viewers and other libraries support

tboby

When trying to parse a PDF containing tabular data the Kagi Assistant replies with "We are sorry, we are not able to extract the source". This occurs with different AI backends selected, so I suspect it's an issue with the PDF pre-processing done by Kagi.

While I can't provide the original document (personal finances!), I've managed to build a test pdf which fails in the same way. Note it isn't one specific pdf file that fails, it's all pdf files generated by a specific source.

The prompt "What does this pdf say?" with this file:

test.pdf

948B

fails.

The following libraries successfully parse text out of my original file as well as my test file:

pdfminer.six (python)
PyMuPDF (python)
PdfPig (c#)

pypdf fails, and my test file produces a similar (but not identical) error to my original file relating to "/Subtype".

Kagi Assistant should support parsing of PDFs that are supported by popular PDF parsing libraries and viewers.

silvenga

What's interesting is that my reader can't find any text (using ABBYY).