Take advantage of Kagi's LLM Model Options by introducing Dual-model Assistant features

Bodine

This is a spin off of this post. The idea is that when generating responses, Kagi Assistant could take advantage of all of its LLM offerings by generating two responses instead of one. There's a number of ways this could work:

Either two models could generate a response and the 'better' one could be shown to the user.
The models could 'work-together' for a while, thinking back and forth between each other and eventually coming to a response that could be shown to the user.
You could do a generator/adversarial setup. Where one model generates a response and another critics it. Then the first generates a better one incorporating the critic's feedback. They could go back and forth for a bit before being satisfied with the answer.
Etc...

A feature like this could improve Kagi assistants responses and be a feature that isn't available on other LLM app offerings.

The user would interact with the assistant as normal. But the responses would potentially be better. Or you could display the back-and-forth to the user in someway, maybe making their assistant chats look like group chats.

kirkmc

How do you define "better?" What you consider a good response may not be what I consider a good response.

Bodine

kirkmc That's something the Kagi team would have to think about and iterate on. But I imagine they would use an objective measure, something like measuring the perplexity of each response against some corpus of verified knowledge or 'good responses' that have been upvoted by users.

MonadCollapse

I think that showing a back and forth between two LLMs would be too confusing to users. And it would ultimately just increase the amount of time to generate a response with little payoff. I agree that Kagi should diversify their assistants features but don't think this is the way to go.

Bodine

MonadCollapse It could at least generate two responses at the same time from different models. ChatGPT currently does this and asks the users to pick their 'preferred response.' It doesn't take any more time than it would for one response.

kirkmc

Bodine No, but it uses twice the tokens (more or less, depending on the LLM).