I don't know which language model to choose in Assistant

Recast

I don't know which model to pick

To preface, I think it's amazing that the Assistant includes access to so many language models, and keeps being updated with new ones. This is really valuable to people who want to control their experience, or people who find that certain models are better at certain kinds of tasks.

However, I'm totally new to all this, so I really don't know which model to pick. I don't have the time to personally go to the effort to analyse the performance, hallucination rate, and writing quality of each model across a variety of tasks.

So, whenever I use the Assistant, my mind's always thinking that I could be getting better results if I'd been able to pick a better model for the task I'm doing!

Yes, Kagi has the Benchmark, but to me this looks like a bunch of numbers that doesn't help me know which model to pick. The benchmark also includes many models that can't be chosen in Assistant. In general, I would prefer to read opinionated advice based on people's experiences with different models, rather than a table of numbers.

Current interface

Overview of problems I've noticed

Models are labelled by their marketing names that don't translate to reality
- e.g. I can choose from GPT 4.1 mini and GPT 4.1 nano
- presumably nano is "smaller"? how does that affect me? is it better for a model to be small or big? how should I choose between mini and nano?
- e.g. I can choose from Mistral Small or Magistral Small
- all I can guess from the name is that magistral has more "magic"? magic sounds good, should I pick that one?
- e.g. I can choose from Llama 4 Maverick or Llama 4 Scout
- huh??
If a model hasn't been in the news, I've no clue what it can do. Mistral is a mystery to me. I've never heard of Qwen outside of this menu.
Unclear if models have the ability to reason, search, or upload files
- One model is directly labelled as reasoning, like Qwen, but others like R1 and o3 you'd only know were reasoning if you'd paid attention to the news over the last few months.
- I asked Llama 4 Maverick what was the difference between Llama 4 Maverick and Llama 4 Scout, but it didn't search the web for info and told me it didn't know the answer.
Unclear what different models specialise in
- The main difference between the companies that I'd notice as a user, is that each model has a substantially different tone, attitude, and writing style, because each company has different RLHF resources.
- However, I don't know from the menu what the differences in attitude actually are.
Lots of similar models clutters the menu and makes it seem more complicated
- I appreciate that I can pick from different variants from the same company! However, I think the interface could be adjusted to support this.

Competitors

DuckDuckGo's AI Chat tries to make this easier to understand by showing key differences:

But I think this is still a very limited and difficult to understand overview: What does Beta really mean? (Nothing.) Does it matter if a model is open source? (No.) Kagi could do a lot better.

Proposal to fix these problems

Provide an opinionated characterisation of each company's models. This means users don't have to experiment with each company's models to find out what they've been RLHF'd to do.
Illustrate the main differences between each model to help with decisions. This means users don't have to memorise features of different brand names.
Prioritise a company's "best" model by moving older or worse models into a dropdown. This both cleans up the screen and makes it easier to choose the best/most common use case.
Use icons to indicate whether models can web search, reason, or process files.

Sketch of a UI design that meets these goals:

I hope the Kagi team considers adding a feature like this to help me choose a model!

I also hope people reading this could let me know which models you personally prefer to use for certain kinds of tasks. This would help me use the assistant better in the meantime.

mortenfyhn

I agree strongly! Claude for instance says something about which model to use when

Of course, each vendor may have some "marketing" in their descriptions, but it would be very useful to have some sort of very brief description of each model, to help choose.

Dustin

This is a great feature request. +1!

ErikMH

Not sure why I can never "like" original posts here, but I sure do +1 like this one. Thank you for taking the time @Recast to post your thoughts so clearly!

Luis

@Recast we've made several updates to the model picker that should have resolved these issues. As most problems are addressed, I'll close this thread.

If you have suggestions for further improvements or additional feedback on the current model picker, please let us know or feel free to open a new thread.

Use icons to indicate whether models can web search, reason, or process files.

All models support web search and file processing. If a model cannot natively process images, we will extract text from the image using a separate model.

Recast

This is pretty good now! I just wish there was a consistently placed indicator for whether the model is reasoning or not. Right now it's sometimes mentioned in titles or descriptions but not always. A consistent indicator would be appreciated. Thank you!