Malware website filtering

ak42

The main idea is using openly available blocklists to filter out results, mainly for the security of your users.

Let me explain:
when searching for "Youtube Revanced" I get this result:

Which is NOT the actual expected result, but some tld-squatting badware.

the expected first result would have been the actual revanced site revanced.app, just like when searching for just "revanced"

It could be an interesting feature to allow the users to select the badware blocklist to filter their results. see https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/badware.txt for the list of badware sites uBlock Origin is using for protection.

ak42

this would make you more safe than google, which also gives that fake website result when searching for "youtube revanced"

kagiuser00

Great idea, this should use both the uBlock Origin malware/risk lists and should probably also use Google Safe Browsing. Bing/Google probably already use GSB but as Kagi develops its own index it should probably be filtering from there too.

Being on those lists should down-weight the result heavily, add a clear indicator to the result, and cause a popup similar to the uBlock page (maybe with a "Don't show this again"/settings option, bonus points if you can "don't show again" per-site as well as turning off the popups account-wide). The result warning and popup should reference the name/link of the list.

ak42

Vlad
about which blocklist to use, why not just allow users to select their own regex based blocklists in settings ? with compatibility with adblocker blocklist formats.
and have the browser check regularly for updates and store the lists in the local storage or something like that ?

ak42

and for the display, just hide them like the blocked domain, and when unhidden, just add a warning sign on the shield icon.
simple, non intrusive, easy.

ak42

another option is to just display "potential malicious site" in "Ads/Trackers" section. From a set of list(s) of your choice.
and let the users decide wheter to block it or not from the settings.

uBlockOrigin lists are heavily used and very regularly updated, I have never ever seen a false positve about badware.
But again, you can also let your user choose the lists just like any adBlocker would do.

a third option, would be to add the result blocking/warning in the extension, and allow for the extension to hide stuff from the search results.

ak42

Honestly, I'm a bit worried this thread is just a cemetery of security related requests.
I'm sorry, but Apr 3, 2022 is way too old.
I've seen you guys fix stuff way faster.
IMHO, Security matters should be one of your top priorities.

If your fear is about fake reporting, allowing for local filtering should be the way to go, and should not be too complicated to implement.

ak42

tired of waiting,

for whoever is interested, I created a small extension (only tested with firefox and chromium)
https://github.com/kamou/kagi-filter

to filter results against https://raw.githubusercontent.com/hagezi/dns-blocklists/main/domains/tif.txt

before:

after:

probably buggy AF. but it is a start.

also, may I highlight that there are still LOTS of false negatives compared to the uBlock Origin badware blocklist.

PRs and Issues are welcome

z64

@ak42

If your fear is about fake reporting, allowing for local filtering should be the way to go, and should not be too complicated to implement.

The problem is that, with very few exceptions, any filtering done by Kagi is after we receive the 10-50 results that we aggregate from upstream indexes. We are severly limited in what we can tell those indexes what to omit before we make the search on your behalf.

This means that, if we just allow people to post massive lists of regex block list rules, the majority of time you are just going to be effectively DDoSing us, checking tons of rules that will likely never hit - and we run on extremely cheap hardware as it is.

That is why our approach is a limited number of very simple rules that we can somewhat optimize around, instead of completely arbitrary regex - and for common things like known malware lists etc., we can pull those sources in and implement them in a way we don't have to worry about this so much. We do allow this for URL rewrites, but again, there is a cap to how many can be used at once, and we don't have to worry about everyone just sharing and pasting massive lists into their settings that are basically doing nothing most of the time.

To that extent, client-side filtering plugins that already exist are better off - the advantage of doing it in Kagi is primarily that you don't need any extensions / its syncronized across devices, and in a few cases it helps us infer how we can make the search in advance. Also our rules implementations work without JS, since we do it server-side. But otherwise, its very similar to if you just did it on your client.

Or, if we identify patterns in upstream sources that are consistently giving us these bad results, we can try to work with them, or simply stop using them if it is that rampant.

So that brings us around to having to manage reporting. Building a workflow for that that doesn't take human resources on our end to verify or allow for malicious biasing is not as simple as writing some code.

It would be a different story if Kagi completely owned our full, general search index stack, but unfortunately we don't.

ak42

z64
I understand yout point, but why no allow for client side filtering (when JS is available) just store a list of blocklists in the user settings, the blocklists can then be downloaded ans stored in localStorage/IndexedDB by the client.
And just have your JS do the filtering on result receptions.
Nothing to do server side.

Obviously this would only work with JS enable clients. But that's still better than nothing. And I personally don't block JS on Kagi as Kagi is a privacy respectful company !

addons/plugins can indeed do the filtering AFTER clicking, but that cripples the quality of search results 🙁

z64

ak42 Thanks for reading my little wall of text, sounds like we're roughly on the same page now. Client side filtering is a good question, and is something that could be done if users were okay with the advantages & drawbacks of that.

The only major "why not" that comes to mind is the complexity that we present to the user, and our bandwidth to maintain it.

We could add a seperate section of "Client Side Rules" and implement them as you described, which would mean

We'd have to communicate the difference to the user so that they could easily understand when they'd want to use one or the other. We'd have to explain that it won't work if you turn JS off.
We may not be able to take opportunities to optimize the initial search like some users may expect.
We'd have to duplicate a lot of our code to the frontend, and deal with keeping them in sync.

So, from our PoV it's a tradeoff of UX complexity vs. time investment to build/maintain vs. value for the average subscriber. (Up until a few months ago, Kagi essentially only had one dedicated FE dev! We've been able to expand the team a bit since.)

As such, the more effective thing up to this point has been for us to not reinvent that wheel, and instead request that popular blocking/filtering extensions add support for us; e.g. https://github.com/iorate/ublacklist/issues/223 (but seems to have gone stale)

All of the above is context. Leading up to now, where I can now say we have more resources and enough interest from the community - if they are OK with the tradeoffs - to consider implementing this ourselves, client-side.

« Previous Page