LLM Model Help Chart - Add Cost and Context Window

CrunchyFritos

I really like the new LLM table that comes up:

Suggestions are as follows:

GPT 3.5 Turbo is significantly faster than GPT 4 Turbo despite what the chart shows (so this is probably a bug)
Add a column for the cost to Kagi. This could be general, using the dots system like for the others, or specific (e.g. $0.01 per 1k tokens). For example, should I use Claude 2 over GPT 4 Turbo to keep costs down for Kagi?
(Optional) Add a column for the context window of each model. Especially for GPT 4 vs. 4 Turbo, the context window difference is significant, and sometimes I need to switch models or re-run because I forget.

My main suggestion is adding a column for the cost to Kagi, unless there are strategic reasons or adverse selection reasons not to do so. From my perspective, I'd like to make sure I use the lowest cost model for my use case, to try and ensure I am as profitable for Kagi as possible.

Zambyte

Regarding context window - how long are the dialogs you're having that you need to be conscious of the context window? The ephemeral nature of the chats seems to lend itself to having short dialogs (maybe 5 - 10 request+replies at most).

Value7609

Zambyte I think this is mostly relevant for code, as code has often way more tokens per characters than english text. And some people don't realize this.

CrunchyFritos

Well GPT 3.5 and 4 have context windows of _4k and _8k tokens depending on the version, and 4k tokens isn't terribly hard to exceed. For example, if you copy in an article text, summarize it, and then have back and forth discussion about it.

My problem is that there is no obvious signal for when you've exceeded the context window. This would be a problem, for example, in the above use case where you've copied in a source text that you want to have a discussion about, since the original text would get truncated.

Also since I don't follow the details of every LLM model, I don't know without first looking it up how long the context window is for e.g. a Claude, Mistral, or PaLM model, meaning I just end up defaulting to GPT 4 Turbo every time which I know is likely the most expensive model but also has a context window that I know for sure I won't exceed in the event I end up having a longer dialogue.

FWIW, this idea came up because I only recently realized the Claude models have 100k context windows. For some reason I just assumed that OpenAI competitors had models that were much more hamstrung (e.g. more like the 2k context windows of the original ChatGPT). I probably would have used Claude more had I known. Likewise for cheaper open-source models should Kagi add them to the menu of options in the future. So I'm totally open to admitting that maybe this just speaks to my lack of updated knowledge in the space: in my head context windows are something I need to be cognizant of, but maybe that's an obsolete concern that won't matter to others.