Kagi does its own benchmarking for LLM's. There are many LLm's at this point with new ones being released every week or so.
https://help.kagi.com/kagi/ai/llm-benchmark.html
note: already tested models are no longer mentioned in this table
| Model | Reason for adding | Status | |
| Mistral: Codestral 25.01 | Announcement - "Only EU-based LLM designed for coding." (nichu42) - "There is an open source version of Codestral but it's less good I assume" (Thibaultmol) | | |
| Mistral: Ministral 8B | Announcement - "Efficient intermediaries for function-calling in multi-step agentic; potentially interesting for Ki." (nichu42) | | |
| bitnet_b1_58-3B | Announcement - "Potentially low cost coding 'lite' model?" - Thibaultmol | | |
| bitnet_b1_58-large | Announcement - "Potentially low cost coding 'lite' model?" - Thibaultmol | | |
| MiniMax‑VL‑01 | GitHub | | |
| Phi‑4 Reasoning | Hugging Face | | Reasoning Plus was tested |
| Doubao Seed 1.6 | Announcement - All-in-One comprehensive model, first domestic model supporting 256K context with thinking capabilities. Supports deep thinking, multimodal understanding, and GUI operations with adaptive thinking modes. ] | | |
| Doubao Seed 1.6 Thinking | Announcement - Enhanced version focused on deep thinking capabilities with improvements in code, mathematics, and logical reasoning. Supports 256K context. | | |
| Doubao Seed 1.6 Flash | Announcement - Ultra-fast version with extremely low latency (TOPT only 10ms). Supports deep thinking, multimodal understanding, and 256K context with vision capabilities matching flagship models. | | |
| Hunyuan A13B | GitHub Release - Mixture of Experts (MoE) model with 80B total parameters and 13B active parameters, delivering high performance with efficient resource usage. Released June 27, 2025. | | |
| Qwen3‑8B | Hugging Face – “Qwen3 is the latest generation of large language models in the Qwen series, offering a comprehensive suite of dense and mixture‑of‑experts (MoE) models.” (Xytronix) | | |
| Apertus | Developed by Swiss research institutes, including EPFL, ETH Zurich, and the Swiss National Supercomputing Centre (CSCS). Designed with a strong emphasis on transparency and privacy, making all development artifacts and data processes available for independent review. | | |
Let me know if you want to track any other models or need more details!
If you need a summary of which models are most likely to be tested soon, or want to know about other benchmarks, just ask.
What are some models that you think Kagi should benchmark and also mention any reasons why you think that a certain model would make sense for Kagi to add to The Assistant (a model might be very fast and very cheap but not score well. It might still be a valid addition).
- Exact model name(s)
- Relevant links (you don't list the api's, just like announcement of the model, or general product page)
- Reason why you think it should be benchmarked, considered for Kagi Assistant (can just be "curious how it performs" but might also be "it's cheap and fast"