Assistant is slow (intermittent)

coreyward

Kagi Assistant is frequently very slow and/or times out. The issue isn't the initial page load, but instead the time to generate or stream a response.

At times it takes an order of magnitude longer to respond as what the quick answer in search does—closer to 30s for Assistant (3.5 Sonnet) vs _3s for Quick Answer.
Responses are often streamed much slower than the underlying model is capable of (e.g. Claude 3.5 Sonnet responses streaming in at 10–15 tokens per second vs the 40-50 that I see when using Claude.ai or hitting the Anthropic API directly)
Intermittent errors further erode the experience (e.g., attached image)
Requests will often be replied to with something along the lines of "I'll need to research this some more," instead of doing the research and returning an answer. Following up with "continue" is usually sufficient, but winds up taking longer.

In addition to reducing latency and increasing the streaming speed, I think there are a number of areas that can be improved in the UI to make it feel more responsive:

Better UI feedback independent of the model output showing the status (e.g., "evaluating", "searching", "reviewing search results") before output is being streamed to the client
Add cancel and retry buttons if output isn't streaming back to the client in a reasonable time (e.g., if any pre-output step takes longer than about 5s to complete)
When the output suggests that additional research is needed and doesn't really answer the question, show a "Continue" button or automatically continue

zut

I've noted this as well.. The assistant feels so much slower compared to directly using the service's API's that it's becoming annoying to use.

Luis

@coreyward your post addresses several issues we've been tackling recently. We've implemented a series of improvements over the past few days.

Are you still experiencing problems? If so, could you please let us know the region you are connecting from and how frequently these issues occur?

coreyward

Luis It's been better recently, so your work may be paying off. I've also been using 4o a bit more than 3.5 Sonnet, so that could have something to do with it, but I just did a quick test on 3.5 Sonnet and it was reasonably quick too, so that bodes well. Thank you!

Luis

I'll mark this thread as Done. If you experience any other problems related to this, please feel free to reach out here

michaluhnak

I wonder if we should reopen this issue. Responses for any of my Assistant prompts take too much to generate, around 20 seconds. Attaching a recording of my experience. I have to use Perplexity or Claude if I want to get any work done efficiently.

EffortsFrom

I confirm this issue. Using Assistant feels as if it is running on batch mode. Using an App with my own API key is/feels dramatically faster. This has been an ongoing issue for me for at least the past 3-4 months. Started using it less, because of it

voidpointer

Yeah I'm definitely noticing this too. Is the Kagi staff tracking this issue anywhere, if not here?

Luis

Could you provide more details? Is the slow response time related to specific models you are using, or is it an issue with the application itself? If it is related to certain models, which ones are affected?
Please note that the performance of Google’s preview models can be unreliable until they are officially released.

BramR

I must admit I tested 2 months ChatGPT pro subscription and now 6 weeks Kagi search with AI ad on. And it's not only much, much slower (leaning to unworkable), it's also less accurate. If this problem isn't solved I'll need to go back to an AI subscription and I don't know if I want to pay an sepperate AI subscription AND kagi search.

Luis

BramR Could you provide a few more details?

slowness: are you using OpenAI models, and if so, are they slower compared to using them on ChatGPT?
less accurate: could you share a couple of examples? We're running extensive benchmarking, and in most of our tests, our accuracy and relevancy are consistently superior to ChatGPT (although testing is focused primarily on KI)