I don't know which model to pick
To preface, I think it's amazing that the Assistant includes access to so many language models, and keeps being updated with new ones. This is really valuable to people who want to control their experience, or people who find that certain models are better at certain kinds of tasks.
However, I'm totally new to all this, so I really don't know which model to pick. I don't have the time to personally go to the effort to analyse the performance, hallucination rate, and writing quality of each model across a variety of tasks.
So, whenever I use the Assistant, my mind's always thinking that I could be getting better results if I'd been able to pick a better model for the task I'm doing!
Yes, Kagi has the Benchmark, but to me this looks like a bunch of numbers that doesn't help me know which model to pick. The benchmark also includes many models that can't be chosen in Assistant. In general, I would prefer to read opinionated advice based on people's experiences with different models, rather than a table of numbers.
Current interface

Overview of problems I've noticed
- Models are labelled by their marketing names that don't translate to reality
- e.g. I can choose from GPT 4.1 mini and GPT 4.1 nano
- presumably nano is "smaller"? how does that affect me? is it better for a model to be small or big? how should I choose between mini and nano?
- e.g. I can choose from Mistral Small or Magistral Small
- all I can guess from the name is that magistral has more "magic"? magic sounds good, should I pick that one?
- e.g. I can choose from Llama 4 Maverick or Llama 4 Scout
- huh??
- If a model hasn't been in the news, I've no clue what it can do. Mistral is a mystery to me. I've never heard of Qwen outside of this menu.
- Unclear if models have the ability to reason, search, or upload files
- One model is directly labelled as reasoning, like Qwen, but others like R1 and o3 you'd only know were reasoning if you'd paid attention to the news over the last few months.
- I asked Llama 4 Maverick what was the difference between Llama 4 Maverick and Llama 4 Scout, but it didn't search the web for info and told me it didn't know the answer.
- Unclear what different models specialise in
- The main difference between the companies that I'd notice as a user, is that each model has a substantially different tone, attitude, and writing style, because each company has different RLHF resources.
- However, I don't know from the menu what the differences in attitude actually are.
- Lots of similar models clutters the menu and makes it seem more complicated
- I appreciate that I can pick from different variants from the same company! However, I think the interface could be adjusted to support this.
Competitors
DuckDuckGo's AI Chat tries to make this easier to understand by showing key differences:

But I think this is still a very limited and difficult to understand overview: What does Beta really mean? (Nothing.) Does it matter if a model is open source? (No.) Kagi could do a lot better.
Proposal to fix these problems
- Provide an opinionated characterisation of each company's models. This means users don't have to experiment with each company's models to find out what they've been RLHF'd to do.
- Illustrate the main differences between each model to help with decisions. This means users don't have to memorise features of different brand names.
- Prioritise a company's "best" model by moving older or worse models into a dropdown. This both cleans up the screen and makes it easier to choose the best/most common use case.
- Use icons to indicate whether models can web search, reason, or process files.
Sketch of a UI design that meets these goals:

I hope the Kagi team considers adding a feature like this to help me choose a model!
I also hope people reading this could let me know which models you personally prefer to use for certain kinds of tasks. This would help me use the assistant better in the meantime.