Kagi Assistant and Whisper Leak Vulnerability

DarkKronicle

(Sorry if this is in the wrong category, didn't know exactly where this should go)

Microsoft recently disclosed a side-channel attack on AI chat bots called Whisper Leak. It allows an attacker who can see encrypted network traffic of streaming AI chat bots (like Kagi assistant) to detect accurately the topics of each chat solely based on packet size and timings.

Since Kagi focuses heavily on privacy, is the assistant vulnerable to this exploit? If so, are there any mitigations deployed?

The mitigation Microsoft deployed was just to add a random string to each response in their API.

RoxyRoxyRoxy

Wow, that's a bit of a crazy read! I'll definitely have the team review if we're vulnerable, thanks for bringing this to our attention.

jacobwinters

Anything having to do with side-channel attacks is impossibly fiddly and I don't have the math background to speak authoritatively on any of this, but I've spent a lot of hours thinking about such things in the past, so I guess it falls to me to type up the response:

Yes, wow. That's one of the scarier things I've read lately. Wouldn't have guessed LLM streaming was that vulnerable.

Current state of Kagi assistant

The only thing we have deployed that would count as a mitigation is that, since some time in September, we've been debouncing updates from our servers so they're only sent at most once per 15 milliseconds. If you're using a model that runs faster than one token per 15 ms (67 tok/s), the debounce logic will start batching multiple tokens together into single update messages. That mitigates some of the problem. Unfortunately it doesn't help at all with slower models. We have some models that are way faster than 67 tok/s and others that are slow enough to worry about. (Not going to name names because these things change quickly and I don't want someone taking my preliminary investigations as security advice.)

Some of our upstream inference providers implement debouncing more aggressive than ours. At least one model I've tested only sends text to our servers every few hundred milliseconds, so when you're using that model, messages from our servers to you will follow the same cadence of giant batches. Again, that's going to be fairly effective.

The worst case scenario is when you can see individual tokens arriving on your screen, because that means they're being generated slowly enough to be sent over the network individually and nothing in the pipeline is assembling them into batches. We do offer a few models that do approximately that and they're going to be the most vulnerable.

Coming soon

We're going to try doubling the debounce interval on our servers to 30ms. It could effectively double the token batch size in some of the worst cases, but we'll have to see what it does to performance. At most you should see one extra frame of delay for some models. In cases where upstream has debounce more aggressive than ours, there should be no change at all.

Random-length padding on streaming updates is a really good idea. I've implemented it and it will roll out over the next few days.

I'll update here once the new mitigations are live.

jacobwinters

Mitigations described above are live.