Kagi Assistant is frequently very slow and/or times out. The issue isn't the initial page load, but instead the time to generate or stream a response.
- At times it takes an order of magnitude longer to respond as what the quick answer in search does—closer to 30s for Assistant (3.5 Sonnet) vs 3s for Quick Answer.
- Responses are often streamed much slower than the underlying model is capable of (e.g. Claude 3.5 Sonnet responses streaming in at 10–15 tokens per second vs the 40-50 that I see when using Claude.ai or hitting the Anthropic API directly)
- Intermittent errors further erode the experience (e.g., attached image)
- Requests will often be replied to with something along the lines of "I'll need to research this some more," instead of doing the research and returning an answer. Following up with "continue" is usually sufficient, but winds up taking longer.
In addition to reducing latency and increasing the streaming speed, I think there are a number of areas that can be improved in the UI to make it feel more responsive:
- Better UI feedback independent of the model output showing the status (e.g., "evaluating", "searching", "reviewing search results") before output is being streamed to the client
- Add cancel and retry buttons if output isn't streaming back to the client in a reasonable time (e.g., if any pre-output step takes longer than about 5s to complete)
- When the output suggests that additional research is needed and doesn't really answer the question, show a "Continue" button or automatically continue