I recently spoke with the creator of https://airouter.io, and I believe it offers a very neat approach to using LLMs.
The service automatically directs the user's prompt to the LLM that appears most suitable for answering the question or completing the task. Airouter employs an additional LLM to classify which LLM to use, based on examples and a specific scoring system. With this method, they demonstrated that overall API costs could be reduced by 50-80%. I think most users wouldn't switch to a smaller model when asking very simple questions, preferring instead to stick with a preselected model for all types of inquiries. As we know, sticking exclusively to GPT-4o or Sonnet-3.5 doesn't make sense. However, for convenience, most people might not switch to 4o-mini or Haiku for simple questions.
In the current model selection process, you could simply add an option like "Kagi Routing" while keeping the other options unchanged.
In addition to reducing costs for Kagi, this approach could enhance the user experience by selecting the most suitable models for the given user input. For example, while Sonnet-3.5 performs slightly better on coding tasks (according to the community), GPT-4o appears to perform better at writing. As this is subject to change with new releases, it is necessary to update the model selection accordingly. However, this can be combined with the integration of new models.
I currently don't know how to implement this in a privacy-preserving manner since it requires collecting samples to evaluate and develop the classification scheme. Nevertheless, I think this could be a great feature to explore or possibly a project for an internship or something similar.