Downweigh domains in search results based on amount of ads/trackers

mesaoptimizer

Related to this feature request except this one is a Kagi-side (aka non-user-specific) request.

This is extremely valuable, since a significant amount of website spam seems to be generated using neural networks, and will only get more and more indistinguishable from user-generated writing in the future. Since one major reason this occurs is so that the domain gets ad views and ad clicks, a filter that detects and downweighs the result based on the ads/trackers amount seems like an incredibly robust solution to the problem. In fact, it seems like an obvious move for Kagi given that Kagi is not dependent on ad revenue like Google is and therefore can downweigh search results based on amount of ads and trackers.

mesaoptimizer

Here's a good example of the sort of search result that would be downweighed to oblivion if this feature was implemented: I searched "saliency map" on Kagi and this is part of the top 10 search results. (This is also true for Google (7th result) but I hold Kagi to a higher standard than Google). 25 trackers clearly implies that opening it is a waste of my time. Not that the website owner will get anything if I did, since I use an ad-blocker.

mesaoptimizer

Heads up: I notice that there is an "Order By" drop down that allows you to order by Ad/Tracker count. However this still retains the 25 Ad search result in the Kagi search result list. I assume that you get only the top 10 (or so) results received by search indexes? If so, this is a harder problem to solve than I thought.

Maybe you could pull more results and order them, and then only return the first 10 or so to avoid this?

Vlad

mesaoptimizer Great discussion.

I imagine this filter would work by you specifying a threshold (for example 10 ads/trackers) and Kagi automatically removing any results above the threshold?

Note that this would have an effect in removing many popular sites like CNN (43) BestBuy.com (10) while not removing alls pam content (as some spam sites may run only 1 or 2 adsense ads).

Second part of the problem is as you noticed retrieving search results form external indexes has a cost. We may retrieve twice as many but that would cost twice as much (passing down the user to twice as expensive Kagi),

Thoughts?

mesaoptimizer

Vlad I imagine this filter would work by you specifying a threshold (for example 10 ads/trackers) and Kagi automatically removing any results above the threshold?

If we are filtering out instead of downweighing, yes. Downweighing is not as intuitive to implement as compared to filtering out, given that you retrieve a tiny block of search results from external indexes instead of every search result. I'd say filtering out is the easier implementation to go with for now.

Note that this would have an effect in removing many popular sites like CNN (43) BestBuy.com (10) while not removing alls pam content (as some spam sites may run only 1 or 2 adsense ads).

You could add a user option to exempt the top 10,000 sites from this filter, for example, but the Kagi user base most likely don't use these sites.

Second part of the problem is as you noticed retrieving search results form external indexes has a cost. We may retrieve twice as many but that would cost twice as much (passing down the user to twice as expensive Kagi),

It seems fine to pass down the cost, in my opinion. There doesn't seem to be any way to avoid this cost using cacheing given that each search string may be unique. I'd like to know too if there are ways to work around this problem.

I'm really excited about this feature, and honestly, most of Kagi's current feature set combined isn't as valuable to me as this one is.

mesaoptimizer

Blocking domains one by one is tiring, when we already have valuable metadata as to how spammy a domain may be (website popularity and ads/trackers). I would like a filter that would automatically block domains from appearing in search results if the Kagi metadata for the domain shows that it has ads/trackers above a certain user specified threshold (for example, 2).

srivd

need this👆
some domain like *.softonic.com are so ugly.

Vlad

mesaoptimizer I welcome further discussion on the topic. How would you implement this feature?

Even a simple implementation can be very complex.

It seems like some kind of rule based system where you can say

if website has MORE than XX TRACKERS and is MORE than 100,000 in TRAFFIC, then REMOVE it from results.

where capitalised keywords are actual rules the user can modify.

Thoughts?

seventhwave

Hi mesaoptimizer and @Vlad ,

mesaoptimizer If we are filtering out instead of downweighing, yes. Downweighing is not as intuitive to implement as compared to filtering out, given that you retrieve a tiny block of search results from external indexes instead of every search result. I'd say filtering out is the easier implementation to go with for now.

but doesn't filtering and downweighting have the same effect for kagi?

if kagi filter out 10 results out of 20, kagi "need" to fetch other 10 result to add to current search
if I downweight 10 results out of 20, all or some of them could fell below the limits that kagi calculate to presents me "acceptable results", and kagi has to add them the same

mesaoptimizer You could add a user option to exempt the top 10,000 sites from this filter, for example, but the Kagi user base most likely don't use these sites.

I'm not so sure about that, and I don't agree in general to this posture: we're saying here that "obviously" kagi users will not need the "most famous" sites if they've ads. But, as kagi doesn't display help messages to a user that search "how to kill myself", I don't also think that kagi has to treat differently for me some results over others, based on popularity.

That said, I've also this desire.

Vlad I welcome further discussion on the topic. How would you implement this feature?

Even a simple implementation can be very complex.

It seems like some kind of rule based system where you can say

if website has MORE than XX TRACKERS and is MORE than 100,000 in TRAFFIC, then REMOVE it from results.

where capitalised keywords are actual rules the user can modify.

My opinion, in sequence, are:
1) rule out the "popularity" dimension: for me it's really a fragile decision
2) introduce the "TRACKER_NUMBER" threshold as an input on user preferences
3) if user sets this parameter (say: 5) then you'll retrieve the same results as before, but every result that have more than TRACKER_NUMBER trackers, are shown below every other "good" result. So something like: first, the group of "good results", ranked appropriately; then, the group of the "bad result", with the relative ranking mantained.
4) these results could be highlighted in some way (a background, something like this), to help users to distinguish them.

What is the sense of this proposal?
a) kagi don't have additional costs
b) user can have an immediate feedback for "how is spammy my result list", and could act accordingly. For example, if all these results but the first two will be "bad", the user could try another search, or refine search terms, etc
c) you can gather user feedbacks, before moving to a further step that will touch the monthly costs.

After this "middle ground", you could look at the valorization of this parameter on the entire user base, and we'll could think if:

this is sufficient
users are asking for filtering, but paying more

Does that helps? I hope so

mesaoptimizer

The marginalia.nu creator has also talked about this strategy for dealing with spam:

An devastating and incredibly simple way of limiting spam, which is to go for the wallet and de-rank sites for having ads.

-- https://news.ycombinator.com/item?id=32586904

Vlad

seventhwave I quite like this proposal, thanks for sharing it.

mesaoptimizer

I'm not so sure about that, and I don't agree in general to this posture: we're saying here that "obviously" kagi users will not need the "most famous" sites if they've ads. But, as kagi doesn't display help messages to a user that search "how to kill myself", I don't also think that kagi has to treat differently for me some results over others, based on popularity.

I agree. I now also believe that there's no need for additional complexity by treating popular sites differently.

3) if user sets this parameter (say: 5) then you'll retrieve the same results as before, but every result that have more than TRACKER_NUMBER trackers, are shown below every other "good" result. So something like: first, the group of "good results", ranked appropriately; then, the group of the "bad result", with the relative ranking mantained.

Can't you do this immediately by using the "Order By" drop down and ordering the results by Ad/Tracker count? Sure, the relative ranking is maintained in your proposal, but this doesn't provide any significant benefit over the current version of Kagi.

b) user can have an immediate feedback for "how is spammy my result list", and could act accordingly. For example, if all these results but the first two will be "bad", the user could try another search, or refine search terms, etc

If a user has to do multiple searches, each query will cost them. This doesn't save money in comparison to Kagi querying for additional search results and entirely filtering out the ones that meet the threshold of high tracker count. Worse, you are wasting user time and effort for the same cost.

seventhwave

mesaoptimizer Can't you do this immediately by using the "Order By" drop down and ordering the results by Ad/Tracker count? Sure, the relative ranking is maintained in your proposal, but this doesn't provide any significant benefit over the current version of Kagi.

the result list is the same as now, and you're right. but what about the user experience about his search?
I mean: in this way you can have a gross understanding about the "tracker quality" of your search, and you may act accordingly. I know, this is a compromise and doesn't satisfy the real need. And by the way, would be also my need (I'd like to have your proposal).

mesaoptimizer If a user has to do multiple searches, each query will cost them. This doesn't save money in comparison to Kagi querying for additional search results and entirely filtering out the ones that meet the threshold of high tracker count. Worse, you are wasting user time and effort for the same cost.

Uhm maybe I don't understand something here.
Thinking about this proposal I'm not caring about the free tier user (with limited queries); insted, I think that the use case will be the full user, that pays XX $/month for unlimited queries.

My real great worry now is that kagi, striving for "extreme" excellence, could have a monthly fee too high for lots of wannabe loyal users.
There are already threads here where users, in a collaborative and loyal way, are asking to make prices lower or have lower tiers...this is NOT because the kagi monthly fee is disproportionate. I fully trust @Vlad and colleagues, so I firmly believe in them, when they write that maybe 10$/month is a fee that will not be enough.
So, what will we choose?

10$ / month for this kagi barebone version with compromises
or
15$ / month for kagi with this filters, and (for example) up to 5 result pages, etc?

I'm very worried about the financial sustainability of kagi: I was waiting for a search engine like this since at least 10 years, and I don't want kagi to die for being too too niche.
I think that we have to build and launch a good search engine looking closely to let the monthly fee as low as possible, giving at the same time the kagi team the money, time and userbase that they need to make a multi-month financial plan.

I'm sorry @Vlad if my worries are not aligned to kagi vision: if so, please tell us and I'll modify my expectations.

Thanks.

Vlad

seventhwave It is likely that at $10/mo Kagi can not be sustainable and we will need to increase the price just to have a shot at having a future vs certainly not having one. We are considering $19/mo ($15/mo, $12/mo for annual/bienneial billing) which will give us a bit more margin/capital to work with and as a part of it we may be able to introduce improvements in search quality.

NoGoogle

Vlad I actually like the concept of tiered plans. Many company's already do it well from video games to Netflix. A base Kagi version of ~$10+ that covers unlimited bare-bones basic search, a ~$20 a month for regular search with all the features Kagi currently has today, and a Pro tier version of Kagi at say ~$50 a month for the must ludicrous of feature requests by pros like tens if not hundreds of pages of results to scroll through and every other feature request here that could be cool but way too expensive to ever consider viable. (I'm just throwing numbers for idea no merits to them).

This would solve lots of the current problems, everybody needs better search but not everybody can pay expensive prices but not everybody needs insane features. Kagi understands user queries and gives no bullshit good results so that in itself is worthy of paying for as a basic base search plan without customization. All the more advanced not so necessary but still necessary can be part of pro plans and such.

Havok

PC Gamer is a well known site, and shows up high on a lot of search results. Ofc I could lower the ranking. But how about letting us add automatic lowering the rank basd on how many ads/trackers they have?

PC Gamer has 92(!) trackers on their site, and I would love to have an option in Kagi that sets website in results automatically to "lower" if they have X trackers (maybe the same for those sites that have little or no trackers?), and then show a text prompt at the top with something like