Well GPT 3.5 and 4 have context windows of 4k and 8k tokens depending on the version, and 4k tokens isn't terribly hard to exceed. For example, if you copy in an article text, summarize it, and then have back and forth discussion about it.
My problem is that there is no obvious signal for when you've exceeded the context window. This would be a problem, for example, in the above use case where you've copied in a source text that you want to have a discussion about, since the original text would get truncated.
Also since I don't follow the details of every LLM model, I don't know without first looking it up how long the context window is for e.g. a Claude, Mistral, or PaLM model, meaning I just end up defaulting to GPT 4 Turbo every time which I know is likely the most expensive model but also has a context window that I know for sure I won't exceed in the event I end up having a longer dialogue.
FWIW, this idea came up because I only recently realized the Claude models have 100k context windows. For some reason I just assumed that OpenAI competitors had models that were much more hamstrung (e.g. more like the 2k context windows of the original ChatGPT). I probably would have used Claude more had I known. Likewise for cheaper open-source models should Kagi add them to the menu of options in the future. So I'm totally open to admitting that maybe this just speaks to my lack of updated knowledge in the space: in my head context windows are something I need to be cognizant of, but maybe that's an obsolete concern that won't matter to others.