Assistant is slow (intermittent)

coreyward

Kagi Assistant is frequently very slow and/or times out. The issue isn't the initial page load, but instead the time to generate or stream a response.

At times it takes an order of magnitude longer to respond as what the quick answer in search does—closer to 30s for Assistant (3.5 Sonnet) vs _3s for Quick Answer.
Responses are often streamed much slower than the underlying model is capable of (e.g. Claude 3.5 Sonnet responses streaming in at 10–15 tokens per second vs the 40-50 that I see when using Claude.ai or hitting the Anthropic API directly)
Intermittent errors further erode the experience (e.g., attached image)
Requests will often be replied to with something along the lines of "I'll need to research this some more," instead of doing the research and returning an answer. Following up with "continue" is usually sufficient, but winds up taking longer.

In addition to reducing latency and increasing the streaming speed, I think there are a number of areas that can be improved in the UI to make it feel more responsive:

Better UI feedback independent of the model output showing the status (e.g., "evaluating", "searching", "reviewing search results") before output is being streamed to the client
Add cancel and retry buttons if output isn't streaming back to the client in a reasonable time (e.g., if any pre-output step takes longer than about 5s to complete)
When the output suggests that additional research is needed and doesn't really answer the question, show a "Continue" button or automatically continue

zut

I've noted this as well.. The assistant feels so much slower compared to directly using the service's API's that it's becoming annoying to use.

Luis

@coreyward your post addresses several issues we've been tackling recently. We've implemented a series of improvements over the past few days.

Are you still experiencing problems? If so, could you please let us know the region you are connecting from and how frequently these issues occur?

coreyward

Luis It's been better recently, so your work may be paying off. I've also been using 4o a bit more than 3.5 Sonnet, so that could have something to do with it, but I just did a quick test on 3.5 Sonnet and it was reasonably quick too, so that bodes well. Thank you!

Luis

I'll mark this thread as Done. If you experience any other problems related to this, please feel free to reach out here