I really appreciate the new clear documentation regarding which LLM providers are and their privacy policies.
https://help.kagi.com/kagi/ai/llms-privacy.html
Something similar to this, but documenting LLM capabilities/what parts of a task are actually handled by the LLM/something else, would be nice.
It would be nice to have another docs page documenting the LLMs in the assistant in these ways:
- native multi-modal capabilities (does the model itself process images/audio uploaded or does Kagi OCR/speech to text them before passing them off)
- What models are used for OCR/speech to text? Who provides this functionality? e.g. OpenAI Whisper?
Functionality to use these tools on their own (specifically ripping high-quality transcripts from an audio file/youtube video) would be super cool! Could be its own feature request.
More clarity on what happens in the background would better inform decisions on which models to use for different tasks.